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INSTRUMENTATION FOR 
RESOURCE MANAGEMENT ARCHITECTURE 
AND CORRESPONDING PROGRAMS THEREFOR 

5 

STATEMENT OF GOVERNMENT INTEREST 

The invention described herein was made in the performance of official duties by 
employees of the Department of the Navy or by researchers under contract to an agency of 
the United States government and, thus, the invention disclosed herein may be manufactured, 
1 0 used, licensed by or for the Government for governmental purposes without the payment of 

any royalties thereon. 



BACKGROUND OF THE INVENTION 

The present invention relates generally to resource management systems by which 
15 networked computers cooperate in performing at least one task too complex for a single 

computer to perform. More specifically, the present invention relates to a resource 
management system which dynamically and remotely controls networked computers to 
thereby permit them to cooperate in performing tasks that are too complex for any single 
computer to perform. Advantageously, software programs for converting a general purpose 
20 computer network into a resource managed network are also disclosed. 



The instant application claims priority fi-om Provisional Patent Application Serial No. 
60/207,891, which was filed on May 25, 2000. The Provisional Patent Application is 
incorporated herein in its entirety by reference. 

25 

Resource Management consists of a set of cooperating computer programs that 
provides an ability to dynamically allocate computing tasks to a collection of networked 
computing resources (computer processors interconnected on a network) based on the 
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following measures: 

an application developer/user description of application computer program 
performance requirements; 

• measured performance of each application programs; 

measured workload (CPU processing load, memory accesses, disk accesses) of each 
computer in the network; and 

• measured inter-computer message communication traffic on the network. 

Many attempts to form distributed systems and environments have been made in the 
past. For example, several companies and organizations have networked multiple computers 
to form a massively parallel supercomputer of sorts. One the best known of these efforts is 
SETI@home, which is organized by SETI (Search for Extraterrestrial Intelligence), a 
scientific effort aiming to determine if there is intelligent life out in the universe. 

Typically, the search means the search of billions of radio frequencies that flood the 
universe in the hopes of finding another civilization that might be transmitting a radio signal. 
Most of the SETI programs in existence today, including those at UC Berkeley, build large 
computers that analyze that data from the telescope in real time. None of these computers 
look very deeply at the data for weak signals nor do they look for a large class of signal types. 
The reason for this is because they are limited by the amount of computer power available 
for data analysis. To extract the weakest signals, a great amoimt of computer power is 
necessary. It would take a monstrous supercomputer to get the job done. Moreover, SETI 
programs could never afford to build or buy that computing power. Thus, rather than use a 
huge computer to do the job, the SETI team developed software to use thousands of small 
computers, all working simultaneously on different parts of the analysis, to run the search 
routine. This is accomplished with a screen saver that can retrieve a data block over the 
intemet, analyze that data, and then report the results back to SETI. 
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Several commercial companies are developing and implementing similar capabilities. 
Moreover, several companies, most notably IBM, have developed networks where each 
networked desktop computer becomes a parallel processor in a distributed computer system 
when the desktop computer is otherwise idle. 

5 

It will be appreciated that these approaches to computing in a distributed environment 
do not provide a system that is both flexible and adaptive (or at least easily adapted) to 
changes in system configuration, performance bottlenecks, survivability requirements, 
scalability, etc. 

10 

What is needed is a Resource Management Architecture which permits flexible 
control, i.e., allowing autonomous start up and shut down of application copies on host 
machines to accommodate changes in data processing requirements. What is also needed is 
functionality included in the Resource Management Architecture which permits the Resource 
1 5 Management Architecture to determine the near-optimal alignment of host and application 

resources in the distributed environment. It would be desirable to have a user-friendly 
technique with which to specify quality of service (QoS) requirements for each host, each 
application, and the network in which the hosts are connected. What is also needed is 
instrumentation to ensure that the specified QoS goals are being met. 

20 

SUMMARY OF THE INVENTION 

Based on the above and foregoing, it can be appreciated that there presently exists a 
need in the art for a Resource Management Architecture, which overcomes the above- 
described deficiencies. The present invention was motivated by a desire to overcome the 
25 drawbacks and shortcomings of the presently available technology, and thereby fulfill this 

need in the art. 

According to one aspect, the present invention provides a monitoring system for a 
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distributed environment including a plurality of hosts capable of executing multiple copies 
of a scalable application, which includes a first device for generating first data corresponding 
to performance of all copies of the scalable application; a second device for generating 
second data corresponding to performance of all host in the distributed environment; and a 
5 third device for generating performance metrics based on the first and second data. 



BRIEF DESCRIPTION OF THE DRAWINGS 

These and various other features and aspects of the present invention will be readily 
1 0 understood with reference to the following detailed description taken in conjunction with the 

accompanying drawings, in which like or similar numbers are used throughout, and in which: 

FIGS. lA, IB collectively represent a high-level block diagram of hardware and 
software components implemented m the Resource Management System according to the 
present invention; 

15 FIGS. 2A, 2B collectively represent a fimctional block diagram of the Resource 

Management Architecture according to the present invention; 

FIG, 3 is a fimctional block diagram illustrating fimctional elements included in the 

system specification library (SSL) implementation of the Resource Management System 

according to the present invention; 
20 FIG. 4 is a block diagram illustrating one technique for implementing the Resource 

(Application) Control fimctional group FG5 in FIGS. 2A, 2B using discrete software 

components; 

FIGS. 5 A, 5B represent a screen capture of a program control display FG54 generated 
by the software components illustrated in FIG. 4; 
25 FIGS. 6A, 6B represent a screen capture of a host display generated by the Resource 

Management Architecture according to the present invention; 

FIGS. 7A, 7B represent a screen capture of performance data regarding several of the 
hosts A - N included in FIGS. 6 A, 6B; 
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FIGS. 8A, 8B represent a screen capture of a path display generated by the Resource 
Management Architecture according to the present invention; 

FIGS. 9A, 9B represent a screen capture of the Resource Management Decision 
Review Display, which provides a sunmiary of allocation and reallocation actions taken by 
5 the Resource Manager; 

FIGS, lOA, lOBand 1 1 A, 1 IB represent screen captures illustrating alternative, user- 
configurable displays generated from received data via standardized message formats and 
open interfaces; 

FIGS. 12A, 1 2B represent a screen capture ofan exemplary version of the Readiness 
10 Display FG66 according to the present invention; 

Figs. 13 A, 13B, and 13C are block diagrams which are useful in explaining various 
operational and functional aspects of the Resource Management Architecture according to 
the present invention; and 

FIG. 14 is a high-level block diagram illustrating connectivity and data flow between 
15 the Hardware Broker and the other Resource Management and Resource Management- 

related functional elements in the Resource Management Architecture; and 

FIG. 15 is a high-level block diagram of a CPU-based general computer which can 
act as a host in the Resource Management Architecture according to the present invention. 

20 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The Resource Management Architecture, which was and is being developed by the 
Naval Surface Warfare Center - Dahlgren Division (NSWC-DD), provides capabilities for 
monitoring hosts, networks, and applications within a distributed computing environment. 
Moreover, the Resource Management Architecture provides the capability of dynamically 
25 allocating, and reallocating, applications to hosts as needed in order to maintain user- 

specified system performance goals. Advantageously, the Resource Management architecture 
provides functionality for determining both how each component within the distributed 
environment is performing and what options are available for attempting to correct deficient 
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performance, determining the proper actions that should be taken, and enacting the 
determined course of action. In addition to these capabilities, the architecture also allows for 
operator control over creating and loading pre-defined static, dynamic, or combined static 
and dynamic system and/or host configurations. One particularly desirable feature of the 
Resource Management Architecture is that it provides capabilities for monitoring system 
performance along with the ability to dynamically allocate and reallocate system resources 
as required. 



10 



15 



Before addressing the various features and aspects of the present invention, it would 
be useful to establish both terminology and the conventions that the instant application will 
follow throughout, hi terms of terminology, a glossary section is presented below. In terms 
of conventions, this application includes information such as source code listing in an 
Appendix section. Since the source code itself is hundreds of pages, the Appendix section 
is divided into attached pages, e.g.. Attached Appendix A, and an optical disk section, e.g., 
CD-Appendix N. Thus, while the appendices are listed in order, the reader must look to the 
signaling language to determine whether any particular appendix is actually provided in 
printed form. 




API (application programming interface) A set of subroutines or 
fimctions that a program, or application, can call to invoke some 
functionality contained in another software or hardware component. 
The Windows API consists of more than 1,000 fimctions that 
programs written in C, C-H-, Pascal, and other languages can call to 
create windows, open files, and perform other essential tasks. An 
application that wants to display an on-screen message can call 
Windows' MessageBox API fimction, for example. 



-6- 



r 



NCN-83018 



BNF 



Acronym for ^Backus Normal Form' (often incorrectly expanded as 
Backus -Naur Form'), a metasyntactic notation used to specify the 
syntax of programming languages, command sets, and the like. 
Widely used for language descriptions but seldom documented 
anywhere, so that it must usually be learned by osmosis from other 
hackers. 



DAEMON 



A background process on a host or Web server (normally in a UNIX 
environment), waiting to perform tasks. Well-known examples of 
daemons are sendmail and HTTP daemon^ 



FUNCTION 



A capability available on a host due to the presence of software (e.g., a 
program), a software module (e.g., an API), etc. 



GLOBUS 



Wide area network (WAN) enterprise management and control 
capability developed under DARPA sponsorship by USC/ISI. 



HOST 



A device including a central processor controlled by an operatmg 
system. 



ICMP 



hitemet Control Message Protocol - ICMP is an extension to the 
hitemet Protocol. It allows for the generation of error messages, test 
packets and informational messages related to IP. It is defmed in STD 
5, RFC 792. 



JEWEL 



An open-source instrumentation package produced by the German 
National Research Center for Computer Science 



NFS 



Network File System - A protocol developed by Sun Microsystems, 
and defined in RFC 1094, which allows a computer system to access 
files over a network as if they were on its local disks. This protocol 
has been incorporated in products by more than two hundred 
companies, and is now a de facto Internet standard. 



QoS 



Quality of Service 
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REMOS 


Remos (REsource MOnitoring System)is a network bandwidth and 
topology monitoring system developed under DARPA sponsorship by 
CMU. Remos allows network-aware applications to obtain relevant 
information about their execution environment. The major challenges 
in defining a uniform interface are network heterogeneity, diversity in 
traffic requirements, variability of the information, and resource 
sharing in the network. Remos provides an API that addresses these 
issue by striking a compromise between accuracy (the information 
provided is best-effort, but includes statistical information if 
available) and efficiency (providing a query-based interface, so 
applications incur overhead only when they acquire information). 
Remos supports two classes of queries. "Flow queries" provide a 
portable way to describe a communication step to the Remos 
unpicmciiuition, wnicn usca iia piaiiuini-ucptiiut/iit ivj.iv/wiwv*^w w 
return to the user the capacity of the network to meet this request. 
"Topology queries" reverse the process, with the Remos 
implementation providing a portable description of the network's 
behavior to the application. 


SNMP 


Simple Network Management Protocol hitemet standard protocol 
defined in STD 15, RFC 1 157; developed to manage nodes, e.g., hubs 
and switches, on an EP network. 



An exemplary system for implementing the Resource Management Architecture 
5 according to the present invention is illustrate in FIGS. 1 A, 1 B, which includes a plurality 

of Host computers A, B, N operatively connected to one another and Resource 
Management hardware RM via a Network 100. It will be appreciated that the hardware 
configuration illustrated in FIGS, la, IB constitutes a so-called grid system. It will also be 
appreciated that the network 100 advantageously can be any known network, e.g., a local 
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area network (LAN) or a wide area network (WAN). It will also be appreciated that the 
hardware RM need not be a discrete piece of equipment; the hardware RM advantageously 
can be distributed across multiple platforms, e.g., the host computer(s), as discussed in detail 
below. In addressing the functional elements and applications in the distributed environment, 
5 it will be appreciated that hosts A-N each can instantiate applications 1 -M. Thus, when all 

applications are being addressed, these applications will be denoted as Al-NM. 

Still referring to FIGS. 1 A, IB, each of the hosts A, B, etc., preferably is controlled 
by an operating system (OSA, OSB, etc.), which permits Host A, for example, to execute 
10 applications Al - AN, as well as an instrumentation daemon IDA, a Program Control (PC) 

agent PC A, and a Host Monitor HMA. It should be noted that instrumentation daemon IDA, 
PC agent PCA, and Host Monitor HMA are integral to the Resource Management 
Architecture while the operating system OSA and applications Al - AN are well known to 
one of ordinary skill in the art. 

15 

In FIGS. lA, IB, the Resource Management Architecture RM advantageously 
includes an instrument collector 10 receiving data from all of the instrumentation daemons 
(IDA - IDN) and providing data to instrument correlator(s) 20, which, in turn, provide 
correlation data to corresponding quality of service (QoS) managers 30. Resource 

20 Management Architecture RM also receives data from host monitors HMA- HMN at history 

servers 40, which maintain status and performance histories on each of the hosts A - N and 
provide selected information to host load analyzer 50. Analyzer 50 advantageously 
determines the host and network loads for both hosts A-N and their connecting network 1 00 
and provides that information to Resource Manager 60, which is the primary decision making 

25 component of the Resource Management Architecture. It will be appreciated that Resource 

Manager 60 also receives information from the QoS managers 30 and exchanges information 
with program controller 70. Program controller 70 sends startup and shutdown orders to the 
Program Control Agents based on operator or Resource Manager-initiated orders. It vnW be 
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appreciated that the operator-initiated orders are received via the one of the program control 
displays 80. 

As will be discussed in greater detail below, the Resource Manager 60 is the primary 
decision-making component of the Resource Management Architecture. The Resource 
Manager 60 is responsible for determining: 

• how to respond to host and application failures; 

• where (i.e., which of hosts A -N) to place new applications; 

• which applications to start up in response to the detection of a new host (host 
N+1); 

• how to resolve application dependencies; 

• what applications should be started, stopped, or moved m response to 
application system priority changes; and 

• based on recommendations from the QoS Managers, when and where 
scalable application should be started or stopped. 

Before leaving FIGS. 1 A, IB, is should be noted that the functions, e.g., instantiated 
programs or software program modules, in the Resource Management Architecture 
advantageously can be distributed across muhiple platforms, e.g., multiple hosts (which may 
or may not be the illustrated Hosts A -N) or a grid system. 

The major fimctional groups of the Resource Management Architecture according to 
the present invention are illustrated in FIGS. 2A, 2B. The fimctions illustrated as solid boxes 
are components of the Resource Management Architecture and are fiiUy described below; 
the functions denoted by diagonal striping denote third-party software which has been 
integrated with the Resource Management Architecture but does not provide core 
fimctionality. Thus, the latter fimctions will be described only to the extent necessary to 
provide integration details. Moreover, it will be appreciated that the fimctions and 
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functionality of the Resource Management Architecture according to the present invention 
are interconnected to one another via niiddleware, which provides message passing interfaces 
between substantially all of the Resource Management functions. This middleware package, 
RMComms, is fully described below. 

5 

The major functional groups provided by the Resource Management architecture in 
an exemplary embodiment of the present invention are illustrated in FIGS. 2A, 2B. A 
summary of the functions provided by the Resource Management Architecture is available 
in Attached Appendix A. These functions, taken together, provide an integrated capability 
1 0 for monitoring and control of a distributed computing environment. In addition, many of the 

functions (and functional groups) within the Resource Management Architecture can also be 
run in a non-integrated configuration, thus providing subsets of the integrated Resource 
Management capabilities. 

1 5 These function(al) groups illustrated in FIGS. 2A, 2B include: 

FGl - Host and Network Monitoring. This function group consists of software which 
monitors the host and network resources within the distributed environment. The 
function group collects extensive run-time information on host and network 
20 configuration, statuses, and performance. Run-time capabilities for discovering new 

hosts that have been started and for determining that existing hosts have gone down 
are also provided. Distribution of current and historical status and performance data 
to other components of the Resource Management Architecture is also provided. A 
more detailed discxission is provided below. 

25 

FG2 - Application-Level Instrumentation. The mstrumentation function group provides 
general-purpose application event reporting and event correlation capabilities. 
Capabilities are provided for collecting and correlating application-provided data 
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such as application statuses, states, performance, and internally detected errors. Low- 
overhead (API) libraries are provided for applications to use in sending out key 
internal event and performance data. This application data is forwarded to other 
components of the instrumentation subsystem which collect data from applications 
on hosts throughout the distributed environment. The system also provides grammar- 
driven capabilities for correlating, combining, and reformatting application data into 
higher-level metrics (composite events) for use by displays or other Resource 
Management components. 

FG3 - System Specifications. A specification language has been developed which allows 
the user to specify: 

1) application software system structure, capabilities, dependencies, and 

requirements; and 

2) hardware system (computer and network) structure, capabilities, and 
configuration. 

Specification files, based on this specification language, are created by the user and 
provide the model of the software and hardware components of the distributed 
computing environment which is used by other Resource Management fimctions. The 
specification information is accessed by other Resource Management functions by 
linking in a specification parser library and making library calls to read in the files 
and convert them to an internal object model. Specific specification data items can 
then be retrieved via an object-oriented API. See the discussion below. 

FG4 - Resource Allocation Decision-Making. This subsystem provides the reasoning and 
decision-making capabilities of the Resource Management architecture. The 
components of this subsystem use information from other subsystems in order to 
determine the health and state of the distributed environment and the options that are 
available for attempting to recover from faults or unacceptable performance. The 
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functions in this particular functional group make decisions regarding: 

1) where new applications should be started; 

2) whether and where failed applications should be restarted; 

3) based on application inter-dependencies, whether and where additional 
applications shoxild to be started prior to starting a particular application; 

4) whether applications are meeting performance requirements and whether 
and where an application can be scaled up or moved when it is necessary to 
improve performance; 

5) whether scalable applications are performing well within performance 
requirements and can be scaled down and which copy should be brought 
down; and 

6) based on operator changes to application system priorities, whether and 
where new applications need to be started or whether and which existing 
applications need to be shut down. 

- Application (Resource) ControL This subsystem provides application control (i.e., 
Program Control) capabilities which permit startmg, stopping, and configuring 
applications on each of the hosts m the distributed environment. The subsystem 
provides both interactive operator control of the distributed environment as well as 
automatic control via configuration orders received from the Resource Allocation 
Decision-Making Subsystem (i.e., the Resource Manager component). The 
interactive controls allow an operator to create, load, save, and edit pre-defined 
system configurations (e.g., lists of applications that are to be run, with or without 
specific host mappings), determine the status and configuration of currently rurming 
programs, and start and stop any or all applications. Both static (operator-entered) 
mappings of applications to hosts and dynamic mappings of applications to hosts 
(where the Resource Allocation Decision-Making Subsystem will be queried to 
determine the proper mapping at run-time) can be defined. The subsystem also 
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provides application fault detection capabilities which are triggered by the 
unexpected death of an application that was started by the subsystem. A basic host 
fault detection capability is also provided which is triggered based on failure to 
receive heartbeat messages from subsystem components running on a particular host. 

FG6 - Displays. The display subsystem provides capabilities for visualizing the status, 
performance, and health of the hosts, networks, and applications in the distributed 
environment. Capabilities are also provided for visualizing the status, performance, 
and health of the Resource Management components themselves. 

As mentioned above, the RMComms middleware package provides the internal 
message passing interfaces between substantially all of the Resource Management functions 
both within each functional group and between the various functional groups. The 
middleware provides for automatic location-transparent many-to-many client-server 
connections. Low-overhead, reliable message passing capabilities are provided. Registration 
of message handler callback functions for specified requested message types is provided with 
the message handler functions being invoked when messages arrive. Registration of 
connection status callback functions, which are invoked when either new connections are 
made or existing connections are broken, is also provided. The middleware package also 
allows for multiple client and server objects to be instantiated in the same application, is 
thread-safe, and provides an easy-to-use object-oriented API through which all capabilities 
are accessed. 

A detailed overview of each functional group and each function instantiated within 
each of the function groups FGl - FG6 of the exemplary embodiment of the Resource 
Management Architecture illustrated in FIGS. 2A, 2B, mcludmg the capabilities provided 
by the fimctional group or function, will now be described in greater detail. The discussion 
below also includes an overview of the information flow between function blocks within the 
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same functional group and between function blocks in separate functional groups. 

FGl - Host and Network Monitoring Functional Group 

Functional group FGl provides extensive monitoring capabilities at the host and 
network levels. The information monitored includes statuses, configuration information, 
performance metrics, and detected fault conditions. By monitoring the individual hosts and 
network components within the distributed environment, the functional group FGl 
determines: 

Accurate State and Performance Information, primarily by gathering the level 
of information necessary for accurately determining the state and health of 
each machine and network component. 
• Distribution of Current Data to Resource Management Components by 
providing current performance and status information, either periodically or 
on request. 

Distribution of Historical Data to Resource Management Components, thus 
providing historical performance and status information on request. 

It will be appreciated that the functional group FGl makes these determinations by 
(or while) providing: 

Common Monitored Data Set and Formats, which permits functional group 
FGl to gather the same set of statuses and statistics in the same formats for 
each host regardless of machine architecture or operating system. 
Minimally-Intrusive Data Collection Mechanisms, which permits functional 
group FGl to gather the information in as non-intrusive a manner as possible 
(in terms of CPU utilization, network bandwidth utilization, etc.). 
Near Real-Time Data Collection Mechanisms, which permits functional 
group FGl to gather the information in as timely a manner as possible. 
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The Host and Network functional group FGl includes the four functions set forth below: 

1) Host Monitors FGIOA - FGION, which reside on each respective machine 
in the distributed environment and collect extensive operating system-level 
data for each host A - N. 

2) History Servers FG12A - FG12N, which collect data from the Host 
Monitors FGIOA - FGION, respectively, maintain status and performance 
histories on each host A - N in the distributed environment, i.e., in the 
Resource Management Architecture, and provide this information to displays 
and other functions with the Resoxirce Management Architecture. 

3) Host Discovery Function FG14, which uses Simple Network Management 
Protocol (SNMP) calls and ping Intemet Control Message Protocol (ICMP) 
calls to determine when new hosts, e.g., host N+1, come on-line and if an 
existing host, e.g., host K, goes down. 

4) Remos Network Data Broker Function FGl 6, which collects information 
on network link bandv^dths from the SNMP-based Remos tool (developed 
by Camegie Mellon University) and passes this information to the Host Load 
Analyzer function of the Resource Allocation Decision-Making functional 
group FG4, both of which are discussed in greater detail below. 

Host monitors FGIOA- FGl ON, which monitor the status and performance of hosts 
A -N, respectively, are instantiated on each host machine within the distributed environment. 
Host Monitors FGIOA - FGl ON employ operating system-level mechanisms to retrieve 
status, configuration, and performance information on each host A - N. The information 
retrieved includes: 

1) operating system version and machine configuration; 

2) CPU configuration, status, and utilization; 

3) memory configuration and usage; 
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4) network configuration, status, and utilization; 

5) filesystem configuration, status, and utilization; and 

6) process statuses including CPU, memory, network, and filesystem utilization 
for each process. 

While Host Monitors FGIOA - FGION are primarily responsible for monitoring the status 
of a particular host, they also provide information on network load as seen by that particular 
host. In the same manner, the Host Monitors FGIOA - FGION also provide information and 
statistics concerning any remotely mounted filesystems, e.g.. Network File System (NFS). 

The information that the Host Monitors FGl OA - FGl ON collect advantageously can 
be formatted into operating system-independent message formats. These message formats 
provide a pseudo-standardized set of state, status, and performance information which is 
usefiil to other components of the Resource Management Architecture, i.e., other components 
do not have to be aware of or deal with the minor differences between data formats and 
semantics. It will be appreciated that since not all the state and performance data is available 
on every platform, in order to indicate which information is available, a group of flags are 
set in the host configuration message indicating whether specific data items are valid on a 
particular platform. 

History Servers FGl 2A - FGl 2N are responsible for collecting information firom the 
Host Monitors FGIOA - FGION and maintaining histories on the statuses, statistics, and 
performance of each host A - N in the distributed environment. This information 
advantageously can be requested by other functions instantiated in the Resource Management 
Architecture. Preferably, the primary consumers of the status information obtained by the 
History Servers FGl 2A - FGl 2N are the Host Load Analyzer (Hardware Broker) component 
of the Resource Allocation Decision-Making functional group FG4, the Host Display FG62A 
- FG62N and the Path Display FG64 of the Displays functional group FG6. The Host Load 
Analyzer FG40 receives information on host configuration and loads (primarily CPU, 
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memory, and network data) from History Servers FG12A - FG12N and employs this 
information to assign host fitness scores. Each Host Display, e.g., FG62A, receives and 
displays current status information on one of the hosts A - N, including process status 
information, and network connectivity information. Each Host Display can also request that 
a respective one of the History Servers FG12A - FG12N provide CPU load information, 
network load information, paging activity data, and memory utilization information, which 
is used to drive line graph charts for specific selected hosts. 

It will be appreciated that History Servers FG12A - FG12N are designed so that 
multiple copies can be run simultaneously. Each of the History Servers FG12A - FG12N 
advantageously can be configured to either monitor all Host Monitors or to monitor only a 
selected set of Host Monitors. It should be mentioned at this point that the History Servers 
FG12A - FG12N determine the list of hosts in the distributed environment that could 
potentially be monitored from the System Specification Library. In this manner, the History 
Servers advantageously can be used to provide survivability (by having multiple History 
Servers connected to each Host Monitor) and/or to perform load-sharing (v^th the History 
Servers FG12A - FG12N each monitoring only a subset of the Host Monitors). It will also 
be appreciated that the History Servers FG12A - FG12N can be configured to periodically 
record history data to disk. These disk files can then be used for off-line analysis of the 
Resource Management Architecture. 

The Host Discovery function FG 14 employs Perl scripts in makmg SNMP and ICMP 
ping calls. These calls are used to periodically scan each subnet and host address in the 
distributed environment m an attempt to determine whether there have been any host status 
changes. In an exemplary case, the list of hosts and subnets that are to be monitored is read 
in from a file; alternatively, this information can reside in and be read from the System 
Specification Library, which is discussed in greater detail below. 
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It should be mentioned that when a new host is first detected, the new host's operating 
system configuration is queried by the Host Discovery fimction FG14 via SNMP calls. 
Information on the newly discovered host and its operating system configuration is then sent 
to the Program Control fimction FG50 in application control fimctional group FG5. 
Likewise, when a host fails to respond to multiple SNMP and ping queries, a message 
indicating that the host appears to have gone down is sent to the Program Control fimction 
FG50. 

The final component of the Host and Network Monitoring fimctional group FGl is 
the Remos Network Data Broker FGl 6, which receives information on network link 
bandwidth and network link bandwidth utilization from the SNMP-based Remos network 
monitoring tool mentioned above. The network information is accessed via the Remos 
application programming interface (API) library and is then sent on to the Host Load 
Analyzer (Hardware Broker) fimction FG40 of the Resource Allocation Decision-Making 
functional group FG4. The network information received from Remos consists of the 
maximum potential bandwidth and the current bandv^dth utilization on specific host network 
links. As mentioned above, Remos network monitoring tool FG 16 is not a core component 
of the Resource Management Architecture; that being the case, no fiirther details on either 
Remos or the Remos Network Data Broker are provided in the instant application. 

FG2 - Application-Level Instrumentation Functional Group 

The Instrumentation fimctional group FG2 advantageously provides general-purpose 
application event reporting and event correlation capabilities. The Instrumentation fimctional 
group permits instrumented application data to be easily accessible to other components of 
the Resource Management Architecture. The fimctional group provides capabilities for 
collecting and correlating application-provided data such as application statuses, states, 
performance, and intemally detected errors. Low-overhead API's are provided that the 
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applications can use for sending internal event and performance data to the instrumentation 
components. The instrumentation functional group FG2 can collect data from applications 
on hosts A - N throughout the distributed environment. The functional group also provides 
grammar-driven capabilities for correlating, combining, and reformatting application data 
into higher-level metrics (composite events) for use by displays or other functional groups 
of the Resource Management Architecture. 



The Instrumentation functional group provides: 

• open APTs and non-proprietary architecture 

• near real-time monitoring support 
cross-language support: C, C++, Ada 

• cross-platform support: Solaris, IRIX, Linux, etc... 

• simple easy-to-use API's 

• low-intrusive instrumentation interface 

• instrumentation interface that does not significantly change the run-time 
behavior of the applications 

• support for passing wide range of data types 

• support for data marshalling / unmarshalling (system independent data 
formats) 

• support for adding to or changing the information being instrumented without 
having to recompile portions of the architecture unaffected by the changes 
(preferably, no recompilation should be necessary expect for recompilation 
of the app being instrumented and any evaluation logic or displays that have 
been affected by the changes) 

scalable architecture (1 00+ hosts / 20+ apps per host / 5+ threads per app) 

• ability for the architecture to perform auto-configuration as required 

• ability to run multiple tests, multiple displays and multiple data logging 
components simultaneously 
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• ability to abstract away the underlying connectivity/communications between 
infrastructure components. 

• ability for instrumentation infrastructure to be brought up and down while the 
application is running 

• ability to easily build and configure new displays and data logging 
components (interactive configuration is preferable) 

• ability to easily build and configure new performance and data correlation 
components (interactive configuration is preferable) 

backwards compatibility with existing Jewel Instrumentation displays 
(protect investments in existing display capabilities) 
backwards compatibility with existing Jewel Instrumentation fimction calls 
(provide ease of transition / backfit) 

As illustrated in FIGS. 2A, 2B, the Instrumentation functional group FG2 includes 
the components enxmierated below. In addition. Instrumentation APIs and Jewel 
Instrumentation will be addressed along with the Instrumentation fimctional group, i.e., the 
Instrumentation fimctional group includes: 

1) Instrumentation API Libraries FG20 are linked v^th the applications and 
provide the fimction call interfaces by which these applications send 
instrumentation data. 

2) Instrumentation Daemons FG22A - FG22N reside on each host in the 
distributed environment and are responsible for reading instrumentation data 
sent out by the applications, reformatting the data into instrumentation event 
messages and sending the messages to the Instrumentation Collectors. 

3) Instrumentation Collectors FG24 A -FG24N connect to the Instrumentation 
Daemons FG22A - FG22N on each host and receive instrumentation 
messages from host A - N. The Collectors forward received messages to the 
Instrumentation Correlators FG26A - FG26N and Instrumentation Brokers 
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FG28A - FG28N. 

4) Instrumentation Correlators FG26A - FG26N receive instrumentation 
messages from the Instrumentation Collectors FG24A - FG24N and provide 
granunar-driven capabilities for correlating, combining, and reformatting 
application data into higher-level metrics (composite events) for use by 
displays or other fimctions of the Resource Management Architecture. 

5) Instrumentation Brokers FG28A - FG28N receive instrumentation 
messages from the histrumentation Collectors and perform task-specific 
reformatting and data manipulation for driving displays or other Resource 
Management components. 

6) Jewel Instrumentation Broker (QoS Monitor) FG29 (a legacy component) 
receives instrumentation data from either the open source Jewel 
instrumentation package or from the histrumentation Collectors. The QoS 
Monitor FG29 performs task-specific message reformatting and data 
manipulation for driving displays and the QoS Managers FG44A - FG44N. 

The applications, e.g., Al-AN, link in the Instrumentation API Library FG20 and 
make API calls to construct and send out instrumentation event messages. Three separate 
APIs are provided for use by the applications: 

1) a printfQ-style API which allows the code to format, build, and send 
instrumentation data with a single fimction call; 

2) a bufifer-construction-style API where the multiple function calls are made to 
construct the instrumentation buffer iteratively, one data element per call; and 

3) a Jewel function call API based on the existing API provided by the Jewel 
instrumentation package (an open-source package produced by the German 
National Research Center for Computer Science). 

The first two APIs are the preferred programming interfaces and take advantage of several 
key instrumentation features while the Jewel API is provided solely for backwards 
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compatibility with existing instrumented application code and is implemented as a set of 
wrappers around the printfQ-style API. All three APIs are supported for C and C-H-. ADA 
bindings have also been produced for the buffer-construction-style API and the Jewel 
function call API. 

Preferably, the instrumented data is sent from the application to one of the 
Instrumentation Daemons FG22A -FG22N on a respective one of the hosts A - N where the 
application is running. The currently preferred mechanism for data transfer is via UNIX FIFO 
(first in - first out) IPC (inter-process communication) mechanisms. It will be appreciated 
that the FIFO mechanism was chosen based on reliability, low overhead, and ease of 
implementation. Alternative data passing mechanisms including shared message queues are 
considered to be within the scope of the present invention. 

As mentioned above, an Instrumentation Daemon resides on each host in the 
distributed environment. The Instrumentation Daemon is interrupted whenever new data is 
written to the FIFO. The Instrumentation Daemon reads the data from the FIFO, reformats 
the data into the standard internal Instrumentation message format (discussed below), and 
sends the data to each of the respective Instrumentation Collectors FG24A -FG24N that are 
currently active. Alternatively, an event request filtering mechanism can be implemented so 
that specific event messages will only be sent to those ones of the Instrumentation Collectors 
FG24A -FG24N that have requested the message. 

The standard instrumentation message format includes a header, a format string 
describing the application-provided data contained in the message, and the actual data values. 
The message components are illustrated in Attached Appendix B. 

The Instrumentation Collectors FG24A- FG24N receive instrumentation messages 
from the histrumentation Daemons FG22A - FG22N on each host A - N, respectively, in the 
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distributed environment. Currently, the Instrumentation Collectors FG24A- FG24N send 
every instrumentation message to all Instrumentation Brokers FG29A- FG29N and 
Instrumentation Correlators (Brokers) FG26A" FG26N that have connected to the 
Instrumentation Collectors FG24 A- FG24N. The histrumentation Collectors FG24A- FG24N 
serve as a pass-through server for instrumentation messages. The Instrumentation Collectors 
do support architecture scalability in the sense that without the Instrumentation Collectors, 
the Instrumentation Broker FG29 and Instrumentation Correlators FG26A- FG26N would 
need to maintain connections to the Instrumentation Daemons FG22A- FG22N on every 
host. As discussed above, an event request filtering mechanism advantageously can be 
implemented so that specific event messages will only be sent to those Instrumentation 
Brokers / Instrumentation Correlators that have requested the message. 

Preferably, the Instrumentation Correlators FG26 A- FG26N provide grammar-driven 
capabilities for correlating, combining, and reformatting application data into higher-level 
metrics (composite events) for use by displays or other components of the Resource 
Management Architecture. Each Correlator reads in a user-specified correlation grammar file 
which is interpreted at run-time by the Correlator's instrumentation correlation engine. 

The Instrumentation Brokers FG28A- FG28N are task-specific applications buih 
aroimd a common code package. The Instrumentation Brokers FG28A- FG28N receive 
instrumentation messages fi-om the Instrumentation Collectors FG24A- FG24N, filter all 
received instrumentation messages to find the messages of interest, and perform task-specific 
message data reformatting and manipulation for driving other components such as displays 
or other components of the Resource Management Architecture. This Instrumentation Broker 
approach permits instrumentation data sources to be quickly integrated for test, display, and 
debugging purposes. 

It should be mentioned at this point that the Jewel Instrumentation Broker FG29 
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(hereafter referred to the QoS Monitor) is a legacy architecture component that served as a 
broker between the Jewel instrumentation package components and Resource Management 
components and displays. The QoS Monitor FG29 was responsible for polling the Jewel 
Collector components to retrieve application event messages. These messages were then 
reformatted and used to drive several displays and the QoS Managers FG44A - FG44N. The 
Jewel instrumentation package has now been replaced in all applications, however the 
message reformatting capabilities of the QoS Monitor have been maintained so that several 
displays and the existing QoS Manager interface do not have to be upgraded inunediately. 
The QoS Monitor component has been modified so that it receives instrumentation data from 
both Jewel and the Instrumentation Collectors. 

FG3 - SYSTEM SPECIFICATIONS FUNCTIONAL GROUP 

Still referring to FIGS. 2A, 2B, it should be noted that a System Specification 
Language has been developed which allows the user to specify both (1) software system 
structure, capabilities, dependencies, and requirements, and (2) hardware system (computer 
and network) structure, capabilities, and configuration. System Specification Files, generally 
denoted FG32, which are based on this specification language, are created by the user and 
provide a model of the software and hardware components of the distributed computing 
environment which is used by the Resource Management Architecture. The language 
grammar advantageously can capture the following information related to the distributed 
environment and the applications that can run v^thin the distributed environment: 
Hardware and Operating Systems 
Hardware Configuration 
Network Configuration 
Operating Systems and Version 
Software 

Systems, Subsystems, Paths, Applications, Processes 
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Resource Requirements 
QoS Requirements (Events) 
Survivability Requirements 

Data Flow Path Information: Structure and QoS Requirements 

It will be appreciated that the System Specification Language allows for grouping 
hardware and software components into systems and subsystems in order to create a 
hierarchy of components. Each application system and subsystem can be assigned a priority 
which is used at run-time to determine the relative importance of applications running in the 
distributed environment. 

At the application level, the hardware, operating system, and other host requirements 
for each application can be specified along with information describing how to start up, 
configure, and shutdown the application. This information can include: 

a) environment variables that need to be set; 

b) the working directory for running the application; 

c) the path(s) and file name of the application; 

d) command-line arguments that should be set, including arguments that need 
to be resolved at run-time (e.g., the hostname where another application is 
running, the current date, the current userid, a imique run-time identifier 
number, etc.); 

e) whether the application needs to run in an xterm; 

f) whether a script file or signal should be run to shutdovra the application; and 

g) which script or signal should be used. 

In addition, startup and shutdown dependencies between applications can be specified. 
Moreover, application states can be defined based on received instrumentation data values, 
the length of time an application has been running, and/or the set of processes that are 
currently running. Furthermore, for each application Al - NM, the survivability and 
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scalability capabilities of the application can be specified. This latter information includes 
whether an application can be restarted if it fails, whether multiple copies of an application 
can be run, what type of scalability the application supports (e.g., Primary-Shadow, Load- 
Sharing, etc.), and the minimum and maximum number of copies that can be run. 
Moreover, an estimate of the amoimt of CPU, memory, and network resources that the 
application will use at run-time, advantageously can be specified. 

At the host level, the operating system and version, the hardware architecture, the 
host's network interface name, and the SPEC organization's SPECfp95 and SPECint95 
ratings for the host can be specified. At the network level, router and switch configurations 
and bandwidths can also be specified. 

Moreover, application data flow paths can be defined including a graph of the data 
flow between applications along with performance requirements tied to one of more of the 
applications within the path. It should be mentioned that these defined requirements are 
named and are tied at run-time to Instrumentation Event data provided by the Instrumentation 
Correlators FG26A- FG26N. Monitoring of the performance requirements is the 
responsibility of the QoS Manager components FG44A - FG44N, as discussed in greater 
detail below. 

As noted above, the System Specification Language provides a hierarchical structure 
for defining software and hardware systems. The current structure is shown below: 
Software Specifications 
Application 

Security 
Configuration 

Hardware Requirements 
Startup Info 
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Dynamic Arguments 
Shutdown Info 
States 
Dependencies 
Initial Load Estimate 
QoS Info 

Survivability 
Scalability 
Hardware Specifications 
Host Info 
Network Info 
LANs 

Network Devices (Interconnects) 
Path Specifications 

Data Flow Graph 
Data Flow Info 
QoS Requirements 

The specification information is accessed by linking in a specification parser library 
FG34 and making library calls to read in the files and convert them to an internal object 
model, and by making object access method calls to retrieve specific data items. The 
specification library is written in C-H- and has been ported to all of the development 
platforms in the testbed. The library is currently being used by most of the Resource 
Management components, including Program Control FG50, the Resource Manager FG42, 
the QoS Managers FG44A -FG44N, the Hardware Broker FG40, and the History Servers 
FG12A-FG12N. 

It should be mentioned that the software used to construct the API library consists of 
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(1) a parser file that defines the grammar (m BNF format), (2) a lexical analyzer file that 
defines the tokens of the language, and (3) a set of C-i-+ System Specification classes for 
storing the specification file information. The lexical analyzer file is compiled with the GNU 
flex (lex) utility and the parser file is compiled using the GNU bison (yacc) utility. The flex 
and bison utilities create C source files which are then compiled along with the C-H- System 
Specification object storage classes to create the System Specification Library (SSL) FG34. 
This library is then linked with the Resource Management applications. An overview of this 
structure is provided in FIG. 3; a more detailed discussion of the various functions are 
provided below. 

FG4 - RESOURCE ALLOCATION DECISION-MAKING FUNCTIONAL GROUP 

As illustrated in FIGS. 2 A, 2B, the Resource Allocation Decision-Making functional 
group provides the reasoning and decision-making capabilities of the Resource Management 
architecture. The functions associated with this functional group employ information (listed 
below) to (1 ) determine the state and health of the distributed environment (hosts, networks, 
and applications), and (2) determine what allocation and reallocation actions need to be 
taken. The information provided to functional group FG4 includes: 
System Specifications: 

Host configuration and capabilities 
Application capabilities 
Survivability 
Scalability 

Potential hosts to run on 
Application startup and shutdown dependencies 
Application and path performance requirements 
Program Control: 

Application statuses 
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Detected application faults 

Detected host failures 

Detection of new host 

Operator initiated requests 

Resolution of application startup or shutdown dependencies 
Selection of application-to-host mappings 
History Servers: 

Host statuses, configuration, and loads 

Network link statuses and loads 
Remos Network Data Broker: 

Network link statuses and loads 
Instrumentation Subsystem: 

Application performance information 
Readiness Display: 

Rim-time changes to application system priorities 

The subsystem components make decisions based on the following triggers and data 

Based on requests from Program Control, determine where new applications 
should be started 

Based on indication of application failure from Program Control, determine 
whether and where the failed applications should be restarted 
Based on indication of host failure from Program Control (or indirectly from 
Host Discovery), determine whether and where the failed applications should 
be restarted 

Based on application inter-dependencies defined in the System Specification 
Files, determine whether and where additional applications should to be 
started (or shut down) prior to starting (or shutting down) a particular 
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application 

Based on startup and shutdown dependency resolution requests from Program 
Control, determine whether and where additional applications should to be 
started (or shut down) prior to starting (or shutting down) a particular 
application 

Based on application instrumentation data and performance requirements 
defined in the System Specification Files, determine whether applications are 
meeting performance requirements and whether an application can be scaled 
up or moved to attempt to improve performance 

Based on application instrumentation data and performance requirements 
defined in the System Specification Files, determine whether applications are 
performing well within performance requirements and can be scaled down 
Based on operator changes to application system priorities, determine 
whether and where new applications need to be started and/or determine 
whether and which existing applications need to be shutdown 
Based on indication that a new host is on-line (from Host Discovery via 
Program Control), issue startup orders to bring up a Program Control Agent, 
Host Monitor, and Instrumentation Daemon on the new host which will bring 
the host under Resource Management control 

The Resource Allocation Decision-Making functional group implements one of the 
three discrete functions listed below: 

1 ) Resource Manager FG 42 is the primary decision-making component of the 
Resource Management Architecture. Resource Manager FG42 is responsible 
for determining ( 1 ) how to respond to host and application failures, (2) where 
to place new applications, (3) which applications to start up in response to the 
detection of a new host, (4) how to resolve application dependencies, (5) 
what applications should be started, stopped, or moved in response to 
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application system priority changes, and (6) based on recommendations from 
the QoS Managers FG44A - FG44N, when and where scalable application 
should be started or stopped. 

2) Host Load Analyzer FG40 is responsible for assigning a set of fitness scores 
to each host based on host capabilities and loads. 

3) QoS Managers FG44A -FG44N are responsible for monitoring application 
and path requirements as defined in the System Specification Files FG32 and 
reconmiending that applications be either scaled up, scaled down, or moved 
in order to maintain acceptable performance. 

As mentioned above, the Resource Manager FG42 is the primary decision-making 
component of the Resource Management Architecture. It is responsible for: 

(1) responding to application and host failures by determining if and what 
recovery actions should be taken; 

(2) determining if and where to place new copies of scalable applications or 
which scalable applications should be shutdown when the QoS Managers 
indicate that scale-up or scale-down actions should be taken based on 
measured application performance; 

(3) determining where new applications should be placed when requested to do 
so by Program Control; and 

(4) determining which and how many applications should run based on 
application system (mission) priorities. 

In order to accomplish these tasks, the Resource Manager FG42 maintains a global 
view of the state of the entire distributed environment including status information on all 
hosts A- N, network 1 00, and applications A 1 -NM. In addition, the Resource Manager FG42 
also calculates software and hardware readiness metrics and reports these readiness values, 
for display purposes, to the display functional group FG6. 
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It will be appreciated from FIGS. 2A, 2B that the Resource Manager FG42 receives 
status and failure information about hosts, networks, and applications from Program Control 
function FG50, This information includes both periodic status updates and immediate 
updates when statuses change such as a new host being detected or an application failing. In 
the case of application shutdown, information as to whether the application was shutdown 
intentionally or whether the application failed is also provided. Program Control fimction 
FG50 also issues requests to the Resource Manager FG42 when new applications need to be 
dynamically allocated and when the Program Control function FG50 determines that the 
Resource Manager FG42 needs to assess and attempt to resolve inter-application 
dependencies (such as one application which needs to be running prior to starting up another 
application). 

The Resource Manager FG42 responds to faulted applications and hosts by 
determining whether the failed applications can and should be restarted and attempting to 
determine where (and if) there are hosts available that the application can run on. When a 
decision is made by the Resource Manager FG42, a message is sent to Program Control 
FG50 specifying what application to start and where to put it, i.e., which of hosts A - N to 
start the application on. The same general mechanism is used when Program Control FG50 
requests that the Resource Manager FG42 determine where to start new applications and/or 
how to resolve inter-application dependencies; the Resource Manager FG42 responds with 
orders indicating what applications to start and where to start them. The Resource Manager 
FG42 advantageously can send application shutdown instructions to Program Control FG50 
requesting that a certain application be stopped; this can occur when the QoS Managers 
FG44A-FG44N indicate that certain scalable applications have too many copies running or 
when application system priority changes (when an application changes from a high priority 
to a lower priority) occur resuhing in scaling back the application system configuration. 

The Resource Manager FG42 also receives host load and host fitness information on 
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all known hosts from the Hardware Broker (Host Load Analyzer) FG40. This information 
includes (1) overall host fitness scores, (2) CPU-based fitness scores, (3) network-based 
fitness scores, and (4) memory and paging-based fitness scores, along with (5) the SPEC95™ 
rating of the hosts. These scores are used by the Resource Manager FG42 for determining 
the "best" hosts for placing new applications when: 

(1 ) responding to requests from the QoS Managers to scale up additional copies 
of an application; 

(2) attempting to restart failed applications; 

(3) responding to requests to dynamically allocate certain applications; and 

(4) responding to application system (mission) priority changes which require 
scaling up additional applications. 

The Resource Manager FG42 also receives requests from the QoS Managers FG44 A-FG44N 
for scaling up, moving, or scaling down specific applications. The Resource Manager FG42 
responds to these requests by determining whether the request should be acted upon and, if 
so, determines the specific action to take. The Resource Manager FG42 then issues orders 
to Program Control FG50 to start up or shutdown specific applications on specific hosts. 

It should be noted that when the Resource Manager FG42 is fu-st started, it reads in 
the System Specification Files FG32 (via calls to System Specification Library FG34) which 
contains the list of hosts that are known to be associated with the distributed environment 
and information on all applications that can be run in the distributed environment. The 
application-level information includes where, i.e., on which host, specific applications can 
be run, which applications are scalable, which applications can be restarted, and any 
dependencies between applications. 

The Resource Manager FG42 currently responds to application system priority 
changes received from the Readiness Broker (translation software in or associated with the 
Readiness Display FG66) in the following manner: 
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(1) If the priority is changed to None, all applications associated with the 
specified system are shutdown. 

(2) If the priority is changed to Low, all scalable applications within the specified 
system are scaled back to no more than 50% of potential maximum 
scalability and are not allowed to be scaled up past the 50% limit irregardless 
of performance. 

(3) If the priority is changed to Medium, normal scaleup and scaledown 
functionality is allowed. 

(4) If the priority is changed to High, all scalable applications are scaled up to at 
least 50% of potential maximum scalability and are not allowed to be scaled 
down to less than 50% irregardless of performance. 

(5) If the priority is changed to Urgent, all scalable applications are scaled up to 
100% (for maximum survivability) and are not allowed to be scaled down. 

[Moreover, if the previous priority was None, and the new changed priority is higher than 
None, all required applications within the specified system are started up subject to the 
limitations outlined for each of the priority levels listed above.] 

The Resource Manager FG42 also sends information about allocation and 
reallocation decisions to the Resource Management Decision Review Displays FG68A- 
FG68N, as discussed in greater detail below. Information on the decision that was made, 
what event the decision was in response to, and how long it took to both make the decision 
and implement the decision advantageously are also sent to the display functional group FG6. 
In addition, information about the alternative choices for where an application could have 
potentially been placed is also provided (if applicable); in an exemplary case, this 
information includes the host fitness scores for the selected host and the next best host 
choices which could have been selected. 

As described above, the Resource Manager FG42 communicates with Program 
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Control FG50, the Hardware Broker FG40, the QoS Managers FG44A -FG44N, QoS 
Specification Control (not shown), the Readiness Broker of display FG66, the Globus Broker 
(e.g., message translation software (not shown)), and the RM Decision Review Displays 
FG68A- FG68N using the RMComms middleware, which will be discussed in greater detail 
below. 

The Hardware Broker (Host Load Analyzer) FG40 is the host load analysis 
component of the Resource Management Architecture, which is primarily responsible for 
determining the host and network loads on each host A - N within the distributed computing 
environment. The Hardware Broker FG40 assigns a set of fitness scores for each host and 
periodically provides the list of fitness scores to the Resource Manager FG42. 

The Hardware Broker FG40 advantageously receives operating system-level statuses 
and statistics for each host A- N fi-om the History Server(s) FG12A-FG12N, respectively. 
This information can be employed for calculating CPU, network, memory, paging activity, 
and overall fitness scores for each of the hosts A-N. Preferably, the Hardware Broker FG40 
periodically, e.g, once per second, provides the complete list of host fitness scores to the 
Resource Manager FG42. 

It should be noted that when the Hardware Broker FG40 is first started, it reads in the 
System Specification Files FG32 (via calls to the System Specification Library (SSL) FG34), 
which files contain the list of hosts that are known to be in the distributed environment. The 
Hardware Broker FG40 also receives, e.g., reads in a file containing, information about the 
bandwidth and maximum packet sizes on all known network subnets in the distributed 
environment. It will be appreciated that this data advantageously can be used for converting 
host network load information based on packet counts to load information based on bytes per 
second and percentage of available bandwidth. 
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Periodically, e.g., approximately every three seconds, the Hardware Broker FG40 
transmits a list of overall and network host fitness scores to the Hardware Broker 
Instrumentation Display which was constructed using the Graph Tool Instrumentation 
Display FG69A-FG69N. Moreover, the Hardware Broker FG40 advantageously can receive 

5 host-based network load data fi-om the Remos Network Data Broker function FGl 6, which 

receives network data via the Remos Network Monitoring software 2. It should be noted that 
if Remos network data is available for any of the hosts A -N that are being monitored, the 
Remos reported network data advantageously can be used for calculating the network fitness 
score for that host, rather than using the host network data received fi-om the History 

10 Server(s) FG12A-FG12N. 

The QoS Managers FG44A - FG44N of fimctional group FG4 are responsible for 
monitoring application-level performance requirements. These requirements are defined in 
the System Specification Files FG32 and are monitored primarily via instrumentation data 

15 obtained directly fi-om the application code. The QoS Managers FG44A - FG44N 

advantageoxisly can determine if applications or application paths are meeting their assigned 
requirements. If an application is not meeting its performance requirements and the 
application is scalable (in the sense that multiple copies can be run and the copies will 
perform load-sharing across the copies), the QoS Managers FG44A - FG44N will either 

20 request that the Resource Manager FG42 scale up a new copy of the application or move the 

application to a new host (as an attempt to achieve better performance). Moreover, if there 
are multiple copies of a scalable application running, and all copies are performing well 
below the specified requirement threshold, the QoS Managers FG44A - FG44N will request 
that the Resource Manager FG42 shutdown a specific copy. It should be noted that the 

25 division of responsibility between the QoS Managers FG44A - FG44N and the Resource 

Manager FG42 is that the QoS Managers determine what actions woxild potentially improve 
performance, while the Resource Manager has final authority to determine whether to 
implement the requested action(s). 
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Each of the QoS Managers FG44A - FG44N can be scaled for both redundancy and 
for load-sharing. In an exemplary case, each copy of the QoS Manager monitors all of the 
requirements associated with a single application path defined in the System Specification 
Files FG32. It will be appreciated that the specific path to be monitored can be specified via 
5 command-line parameters. By default, without specifying a path via the command-line, the 

QoS Managers FG44A - FG44N will monitor all requirements for all paths defined in the 
System Specification Files FG32. 

It should be mentioned that, in one exemplary embodiment, the QoS Managers 
1 0 FG44 A - FG44N each employ a sliding window algorithm to determine when to declare that 

applications should be scaled up or scaled down. The inputs to the algorithm define both high 
and low sampling window sizes, the maximum number of allowed violations within the 
sampling window, and violation thresholds as a percentage of the actual specified 
requirement value. It should also be mentioned that the sliding v^ndow algorithm was 
1 5 selected in order to damp out unexpected "noise" or "spikes" in the measured performance 

data. Moreover, the threshold value as a percentage of the actual requirement value was 
selected in order to scale up, or scale down, prior to violating the specified hard requirement. 
The QoS Managers FG44A - FG44N provide application scale up and scale down requests 
to the Resource Manager FG42 when the measured performance data for an associated 
20 application violates either the high (scale up) or low (scale down) sliding window criteria for 

a specific requirement. A scale up request indicates which application on which host has 
violated the performance criteria, and a scale down request indicates which application on 
which host is reconunended to be shutdown. It will be appreciated that the success of this 
algorithm is highly dependent on the rate of change and noisiness of the measured data. 

25 

Any of the QoS Managers FG44A - FG44N can also request that the Resource 
Manager FG42 move an application. This will occur in the case where one copy of an 
application is performing much worse than all other running copies of the same application. 
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In an exemplary case, the move request is implemented as a scale up request followed by a 
scale down request (of the badly performing copy), hi that case, the scale down request does 
not get sent to the Resource Manager FG42 until the scale up action has been implemented. 
The QoS Managers FG44A - FG44N preferably employ application "settling times," defined 
m the System Specification Files FG32, to ensure that once a requested action has been sent 
to the Resource Manager FG42 that no additional actions are requested for that application 
until after the settling time has elapsed. It will be appreciated that this provides time for 
initialization and configuration among the application copies to occur. Alternatively, System 
Specification Language inter-application dependency definitions advantageously can be used 
instead of settling times. 

The QoS Managers FG44A - FG44N also receive application status and state 
information fi-om Program Control FG50, which periodically sends application status updates 
for all running applications and also sends immediate indications of any applications which 
have been started or stopped. This information is used by the QoS Managers FG44A - 
FG44N, along with the instrumented performance data being received via the QoS Monitor 
FG29 and Instrumentation Correlator FG34, to determine the exact state of all monitored 
applications that are running. This information is also used to determine when (and if) 
requested actions have been implemented by the Resource Manager FG42. The information 
is also used for setting up and discarding internal data structures used for monitoring the 
performance of each application Al-NM. 

It will be appreciated that the QoS Managers FG44A - FG44N also receive 
application-level instrumentation data indicating current application performance values 
from the Instrumentation Correlators (Brokers) FG26A -FG26N, the Distrumentation Brokers 
FG28A-FG28N, and/or the Jewel Instrumentation Broker (QoS Monitor) FG29. The 
instrumentation data that is received contains (at a minimum) (1) the timetag when the data 
was generated, (2) the hostname and IP address of the host where the application that the data 
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is associated with is running, (3) the process id (pid) of the application that the data is 
associated with, and (4) the event number of the instrumentation message. Preferably, the 
event number of the instrumentation message specifies the type of instrumentation data that 
has been received; the hostname, IP address, and pid are used, in conjunction with the 
application data received from Program Control FG50, to determine the specific application 
that the data is associated with. 

When the contents of the instrumentation message match any of the application 
performance requirements that are cxirrently being monitored by the QoS Managers FG44A - 
FG44N, the data value is added to the proper requirement sliding window for the specified 
application. The sliding window algorithm is then checked to determine if the new sample 
triggered a yiolation of either the high or low sliding vmidow. If a high threshold sliding 
window violation occurs and the application does not already have the maximum number of 
copies running, a determination is made as to whether performance can be best improved by 
starting a new application (scale up) or by moving an existing copy to a different host. The 
corresponding action recommendation will then be sent to the Resource Manager FG42. In 
an exemplary case, the criteria for determining whether an application should be moved 
rather than scaled up is based on relative performance of the replicated applications. More 
specifically, if one application is performing much worse [> 50%] than the other copies, the 
recommendation will be to move the application. Likewise, if the new sample triggers a low 
threshold sliding window violation and the application has more than the minimum number 
of copies running, a recommendation will be sent to the Resource Manager FG42 requesting 
that the copy of the application that is experiencing the worst performance be scaled down. 

FG5 - RESOURCE (APPLICATION) CONTROL FUNCTIONAL GROUP 

As discussed above, the Resource Control capabilities provided by the Resource 
Management Architecture consist of controlling application startup, configuration, and 
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shutdown on hosts within the distributed environment. This capability, known as Application 
Control or Program Control (hereafter referred to as Program Control) provides a powerful 
distributed configuration capability. The Program Control capabilities permit an operator to 
startup and control applications running on platforms throughout the distributed environment 
via an easy-to-use interactive display. These capabilities are provided by the Application 
Control functional group FG5. 

More specifically, the Application Control functional group provides application 
control (i.e.. Program Control) capabilities which permit starting, stopping, and configuring 
applications on each of the hosts in the distributed environment. The functional group 
provides both interactive operator control of the distributed environment as well as automatic 
control via configuration orders received from the Resoxirce Allocation Decision-Making 
functional group FG4, i.e., the Resource Manager component. The interactive controls allow 
an operator to create, load, save, and edit pre-defined system configurations, e.g., lists of 
applications that are to be run, with or v^thout specific host mappings, determine the status 
and configuration of currently running programs, and start and stop any or all applications. 
Both static (operator-entered) mappings of applications to hosts and dynamic mappings of 
applications to hosts (where the Resource Allocation Decision-Making functional group FG4 
v^U be queried to determine the proper mapping at run-time) advantageously can be defined. 
The functional group also provides application fault detection capabilities which are 
triggered by the unexpected death, i.e., fault, of an application that was started by the 
functional group. A basic host fault detection capability is also provided which is triggered 
based on failure to receive heartbeat messages from functional group components running 
on a particular host. 

A brief description of each function provided by the functional group FG5 is provided 
below; a detailed discussion of the Resource Control functional group FG5 and associated 
data flow v^U be provided in discussing FIG. 4. 
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Program Control Agents FG52A- FG52N: A Program Control agent 
generally denoted FG52 resides on each of the hosts A-N (i.e., PCA - PCN). 
Each agent is responsible for providing direct control over application startup 
and shutdown of applications on its respective host. The agent receives 
control orders from the Program Control function FG50 and is then 
responsible for implementing the orders. In an exemplary case, the agents 
implement the orders via system call mechanisms specific to the particular 
operating system. In addition, the agent also provides feedback to the Control 
function FG50 regarding the current status of all applications running on a 
particular host. 

Program Control FG50 - maintains the application state information for the 
Program Control functional group FG5. It also serves as the decision-making 
component of the Program Control functional group. The Control function 
FG50 receives application control (startup, shutdown, or configuration) 
requests from the Program Control Displays FG54A - FG54N and from the 
Resource Management functional group FG4. Using information from the 
Specification Files FG32, these high-level control function requests are 
dynamically translated into specific control orders which are sent to the 
mdividual Program Control agents FG52 A -FG52N. The program Control FG 
50 also provides application status and configuration information back to the 
Resource Manager FG42. 

Program Control Displays FG54A - FG54N - serve as the GUI for 

interactive control of distributed applications. The Program Control Displays 
FG54A - FG54N allow an operator to see and control the status of 
applications running on each host in the distributed environment. The 
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Program Control Displays FG54A - FG54N also provide the user the ability 
to determine the status of each of the components of the Program Control 
architecture. Predefined scenario configurations defined in Program Control 
Configuration Files FG56 advantageously can be loaded and edited via the 
Displays. It should be mentioned that new Program Control Configuration 
Files can also be created and saved via the Displays. As illustrated in FIGS. 
2A, 2B, Program Control Displays FG54A - FG54N can be run 
simultaneously with application status changes being reflected at each 
display. 

4) Configuration Files FG56 - contain an ordered set of applications that can 
be loaded at the Program Control display and then either edited or executed. 
The Configuration Files can contain both dynamic and static application-to- 
host mappings. For static application-to-host mappings, an application will, 
by default, be started on a specified host. For dynamic application-to-host 
mappings, the application will have a default host to start on but the Resource 
Manager FG42 will be queried at run-time to determine where the application 
actually should be placed. The Configuration Files FG56 also contain all 
information on how to start, stop, and configure an application, with the 
exception of environment variable settings for the application which are set 
based on the System Specification Files FG32. 

It should be mentioned here that the Program Control functional group employs the 
application startup and shutdown information defined in the System Specification Files 
FG32. When an application entry is first created interactively at one of the Program Control 
Displays FG54A - FG54N, all of the startup and shutdown information for that application, 
as specified in the System Specification Files FG32, are loaded m as default settings. Once 
a configuration file entry has been created, all configuration information on the application 
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is read in from the configuration file except for the application environment variable settings 
which are still set based on the System Specification Files FG32, 

As mentioned above, a Program Control agent resides on each host. The agent is 
responsible for providing direct control over application startup and shutdown. The agent 
receives control orders from the Control component and is then responsible for implementing 
the orders. Each of the PC Agents FG52A - FG52N implements application startup and 
shutdown orders via system call mechanisms specific to the particular operating system of 
the host. For example, on the Unix platforms, to start an application, the forkQ and execvQ 
function calls are used to create the application. The csh command is executed to start up the 
applications. Moreover, if the application needs to run in a console, an xterm is configured 
for the application to run in. In addition, if logging of either stdout or stderr is specified, the 
proper redirection operators are configured and the output log file is set to 
7usr/tmp/<userid>_<appname>_<pid>.log". All environment variables needed by the 
application are also configured and passed in at the execvQ call. The current working 
directory is also set by the chdirQ command, and the new application is made a process group 
leader via the setpgidQ fimction. Other operating systems invoke applications using different 
calls. 

In order to stop an application on the Unix platforms, if a signal is to be sent to the 
application, the killpgQ fimction is used, or else if a script or command is to be executed to 
shutdown the application, the csh command is executed (via the systemQ fimction) 
specifying the full path and executable name of the command along with any arguments for 
the command. It should be noted that if the application default shutdown time elapses and 
the application has not died, the respective one of the Program Control Agents FG52A- 
FG52N advantageously sends a SIGKILL signal to the application by calling killpgQ. 

As illustrated in FIGS. lA, IB, the Program Control Agents (PCA-PCN) 



-44- 



NCN-83018 

advantageously can be instantiated on stand-alone hosts A - N. In that case, the Program 
Control Agents PCA-PCN (FG52A-FG52N m FIGS. 2A, 2B) send heartbeat messages to 
Program Control FG50 approximately once per second to indicate that they are still "up and 
running." Moreover, every ten seconds, the Program Control Agents PCA-PCN (FG52A- 
FG52N) send complete configuration information on all running applications to Program 
Control FG50. It should be noted that the terminology employed in FIGS. 1 A, IB differs 
from that in FIGS. 2A, 2B to emphasize the distinction between software instantiated on a 
host and a function provided by the Resource Management Architecture. 

The Program Control function FG50 is the decision-making component of the 
Program Control functional group FG5 . It maintains complete information on everything that 
is running across all platforms in the distributed environment. The Program Control function 
FG50 receives input data fi-om PCA-PCN (FG52A-FG52N), the Program Control Displays 
FG54A-FG54N, the Resource Manager FG42, and the Host Discovery function FG14. 

It will be appreciated fi-om the preceding discussion that the Program Control FG50 
provides startup and shutdown orders to the Program Control Agents FG52A-FG52N based 
on operator or Resource Manager-initiated orders. If the Program Control Agents report that 
an application has terminated abnormally, the Program Control FG50 provides a notification 
to the Resource Manager FG42, to the Program Control Displays FG54A - FG54N, and to 
any other component to which it is connected. When the Program Control function FG50 is 
first brought up, it can be configured to attempt to start Program Control agents on every host 
defmed in the System Specification Files. The Program Control function FG50 will also 
attempt to start a Program Control Agent on a newly discovered host (discovered via the 
Host Discovery function FG14) if Host Discovery has been enabled on the Program Control 
Displays FG54A-FG54N. 

The Program Control function FG50 also receives periodic heartbeat messages, e.g.. 
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once per second, from each of the Program Control Agents FG52A-FG52N, as discussed 
above. If Fault Detection has been enabled at the Program Control Displays FG54A-FG54N, 
if three consecutive heartbeat messages from an Agent, e.g., FG52 A, are missed, the host that 
the agent is running on is declared dovra and all linked ftmctions, including the Resource 
Manager FG42 and the Displays FG54A-FG54N are notified. 

As mentioned above, the Program Control function FG50 sends out periodic 
application status updates as well as immediate notification when applications are started up, 
are shutdown, or fail. These notifications are sent out to all linked functions. 

It should be noted that the Program Control function FG50 uses the same message 
traffic and internal processing for handling application startup and shutdown orders received 
from either the Resource Manager FG42 or from the Program Control Displays FG54A- 
FG54N. However, if a startup order received from one of the Program Control Displays 
FG54A-FG54N indicates that the Resource Manager FG42 should determine where to run 
the application, a request to allocate the application is sent to the Resource Manager FG42. 
When no response is received from the Resource Manager FG42 within a predetermined 
timeout period, the Program Control function FG50 will automatically start the application 
on the default host. Moreover, when an application startup cannot proceed due to an 
unfulfilled application startup dependency, a request will be made to the Resource Manager 
FG42 to attempt to resolve the dependency. If the Resource Manager FG42 either cannot 
resolve the dependency or no response is received v^thin a predetermined timeout period, 
the application startup will fail, and a "dependency failed" indication will be sent to the 
Display. It will be appreciated that this will cause the application status to be displayed in, 
for example, yellow and post an alert to the Alert wmdow on one of the Program Control 
Displays FG54A-FG54N. 

Preferably, Program Control function FG50 also handles simple startup timing 
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dependencies between applications and will reorder a list of applications that were selected 
to be started simultaneously if doing so will resolve startup order dependencies between the 
applications. Otherwise, the Program Control function FG50 sends a request to the Resource 
Manager to attempt to resolve the dependencies. 

The Program Control Display serves as the operator console for controlling the 
distributed environment. From the Display, shown in FIGS. 5A, 5B, the operator can: 

1 ) see the status and configuration of cxirrently executing applications A 1 -NM; 

2) see the status of Program Control Agents PCA-PCN on each host A-N; 

3) see and browse the application system structure defined in the System 
Specification Files FG32; 

4) load configuration files FG56 

5) save configuration files FG56 

6) edit the configuration of applications that are not currently running; 

7) create new application entries by dragging an application, application system, 
or application subsystem icon onto the application status area; 

8) manually start specific applications; 

9) manually stop specific applications; 

10) manually start all applications that have the "Start AH" flag set; 

1 1) manually stop all applications; 

1 2) tum host fault detection on or off (if on, loss of 3 consecutive heartbeats from 
a Program Control Agent will result in declaring the host down); and 

13) tum host discovery on or off (if on, a new host message from the Host 
Discovery component will result in attempting to start up a Program Control 
Agent on the new host). 

It will be appreciated from FIGS. 2A, 2B that multiple Program Control Displays 
FG54A-FG54N advantageously can be run simultaneously. If this is done, any configuration 
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change actions will be reflected on all the displays. Whenever application stop or start 
actions are taken by the display operator, a message is sent to the Program Control function 
FG50 which is responsible for enacting the start or stop action. The Program Control 
function FG50 also sends indications of any status changes to the Program Control Displays 
5 FG54A-FG54N as soon as the status changes are seen. In addition, periodic status updates 

are also sent to the Program Control Displays FG54A-FG54N. 

The Program Control Configuration Files are text files that are read in by the Program 
Control Display when the operator wishes to load a new application configuration. A 
1 0 Configuration File is an ASCII file containing a list of applications. The format of an entry 

in a Configuration File is shown in Table 1 below. 

Table 1 



Application TACFIRE:tacfire 
15 Host electral 

Display umbriell:0.0 

Auto_Start 0 

RM_Start 0 

Console 1 
20 Time_Delay 1 

StartupDir "$ENV_SIM_VERSIONyTACFIREprocessor" 

StartupExe "$ENV_SIM_VERSION/TACFIREprocessor/tacfire" 

StartupArgs "-disport SDIS_PORT_NUM -cffhost %(HOSTNAME, 

AAW:TacticaLSims:CFF_Broker)" 
25 ShutdownExe SIGINT 

LogType STDOUT 

LogDir "/usr/tmp" 

The Configuration file advantageously can include the following fields: 
30 1 ) The Application field, which identifies the fiiU application name as defined 

m the System Spec. Files FG32 (i.e., System:Subsystem:Application). 

2) The Host field, which is the desired or default host that this application 
should be started on. 

3) The Display field, which is an optional field used when graphical display 
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output from an application needs to be rerouted to a display on a different 
host. 

The Auto_Start flag, which identifies whether the application is to be started 
automatically if the "Start All" action is selected by the operator from the 
Program Control Display. (If the flag were set to "1", then the application 
would be started. If the flag were set to "0," it would not be started.) 
The RM_Start flag, which identifies whether the Resource Manager should 
be queried at run-time to determine what host the application should be 
started on. The valid values are "0" for "NO" and "1 " for "YES". 
The Console flag, which identifies whether the application needs to be started 
in an Xterm window. The valid values are 0 for 'TSFO" and 1 for "YES". 
The Time_Delay field, which identifies how many seconds to wait after the 
previous application has been started before starting this application. 
The StartupDir field, which identifies the current working directory that is 
to be set prior to starting up the application. This directory is usually the same 
as the directory where the executable for the application resides but does not 
have to be. As this example shows, environment variables may be used in the 
path. 

The StartupExe field identifies the entire path and name of the application 
executable. 

The StartupArgs field, which contains all the argument values needed for 
this particular application. As this example indicates, the argument values can 
be dynamically set at run time if needed. Environment variables may also be 
used within the argument list. In this example, the %(UNIQUE, 1, 40, Isis) 
argument would yield a number from 1 to 40 which is imique within a 
context named "Isis". Another resolution of %(UNIQUE, 1, 40, Isis) would 
yield a different number. 

The ShutdownExe field, which identifies which signal defined within the 
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application that program control is to use to shutdown this application. Some 
examples would be SIGINT, SIGTERM, or SIGKILL. A shutdown script can 
also be used to shutdown the application. (In that case, there would be 
ShutdownDir, ShutdownExe, and ShutdownArgs fields listed. The usage 
for the shutdown fields would be used exactly the same as the startup fields.) 

12) The LogType field, which identifies which outputs are to be written to the 
specified log file. The valid values are STDOUT, STDERR, and LOG_ALL. 
STDOUT is the normal output of the application (stdout). STDERR is the 
error output of the application (stderr). LOG_ALL writes both stdout and 
stderr outputs to the file. 

13) The LogDir indicates the directory where the log file will be written. Again, 
environment variables may be used here. The log file name will be 
"<userid>_<appname>_<pid>.log" where <appname> is the fiill application 
name as specified in the Application field, <userid> is the userid of the 
current user under which the program control application is running, and 
<pid> is the system assigned process id of the application being executed. 

FG6 - DISPLAY FUNCTIONAL GROUP 

A number of displays which show system configuration data and instrumentation data 
in near real-time are included as part of the Resource Management Architecture. These 
displays support operator and user monitoring of the operation of the distributed environment 
including host and network statuses and performance, application system statuses and 
performance, as well as the status and performance of the other Resource Management 
architecture fimctions. Most of the displays use OpenGL and Motif, the latter being built with 
ICS's Builder Xcessory toolkit, and run on Silicon Graphics (SGI) platforms in an exemplary 
case. Several of the displays can also run on the Sun Solaris platforms. The displays that 
make up the display fimctional group FG6 include: 
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1 ) Host Displays FG62 A -FG62N. Show layout of hosts along with host status, 
network connectivity, and process statuses. 

2) Path Display FG64. Shows the status of applications in key end-to-end data 
flow paths along with performance and load graphs. 

3) Resource Management Decision Review Display FG68. Shows a summary 
of allocation decisions made by the Resource Management system along with 
timing information and host fitness scores. 

4) Graph Tool Instrumentation Displays FG69A-FG69N. Provides a user- 
configurable set of display widgets used for run-time monitoring of 
instrumented status and performance information. 

5) System Readiness Display FG66. Shows the status of each hardware and 
software system, subsystem, and application defined in the System 
Specification Files and allow the operator to interactively change system and 
subsystem priorities. 

FIGS. 6A, 6B represent a screen capture of an exemplary one of the Host Displays 
FG62A-FG62N, which provide graphical representations of various sets of the hosts A- N 
in the distributed environment. The Host Displays show the status of each host, host network 
connectivity, and the status of interesting processes running on the hosts. The Host Display 
operator can also select hosts shown on the Host Display and bring up real-time graphs of 
system performance for the selected hosts including CPU utilization, memory utilization, 
network packets in, network packets out, and paging activity. A screen capture of host 
specific performance information is provided in FIGS. 7A, 7B. 

FIGS. 8A, 8B represent a screen capture of a representative Path Display FG64, 
generated by the Resource Management architecture, which shows the status of key system 
data flow paths consisting of multiple application stages. The number of copies of each 
application in the path is shown labeled with the host on which the application is running. 
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In addition, it should be mentioned that as many as three real-time graphs can be produced 
to depict run-time performance and load metrics related to the applications in the selected 
data path. 

FIGS. 9A, 9B represent a screen capture of the Resource Management Decision 
Review Display FG68, which advantageously can provide a sunmiary of allocation and 
reallocation actions taken by the Resource Manager FG42. For each action, timing 
information regarding how long it took the Resource Management functions, e.g., the 
Resource Manager FG42 and the Program Controller FG50, to both arrive at a decision and 
to enact the decided action are shown along with host fitness scores that were used in 
arriving at the allocation decision. 

FIGS. 1 OA, 1 OB and 1 1 A, 1 1 B are screen captures of the Graph Tool Instrumentation 
Displays FG69A-FG69N, which depict user-configurable displays capable of receiving data 
via standardized message formats and open interfaces. The Graph Tool Displays FG69A- 
FG69N allow the operator to select and configure varioxis display widgets (line graphs, bar 
charts, pie charts, meters, and text boxes) to build a desired display layout. Data sources for 
driving the widgets can also be selected interactively. 

FIGS. 12A, 12B represent a screen capture of the System Readiness Display FG66, 
which advantageously can be a Java™ display with a CORB A™ interface. The display FG66 
shows the status of each hardware system, host, application system, application subsystem, 
and application defined in the System Specification Files. The top portion of the display 
shows a simmiary status for each defined application system. It should be noted that the 
display operator can also change system and subsystem priorities and send the changed 
priorities to the Resource Manager function FG42. 

As mentioned above, the RMComms middleware package provides object-oriented 
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client-server services for message communication between distributed applications and 
function modules. The middleware provides location transparency and automatic socket 
connections and reconnections between client and server applications. These services 
advantageously can be accessed through an object-oriented API which allows client and 
server objects to be easily created and exchange user-defined message data. The abstraction 
provided by the API allows the user to quickly and easily create distributed applications 
without needing to be aware of the details of the underlying network mechanisms. The 
RMComms middleware provides the following functions: 

provides location transparency between clients and servers 
provides a simple powerful object-oriented client-server API 
supports reliable transport of user-defined message data 
based on Berkeley sockets 

uses TCP for message transport 

uses UDP multicast for identification of new clients or servers 
servers identified by imique assigned UDP/TCP port numbers 
provides general purpose callback function registration capabilities 

user-specified message callback functions invoked when specified 
messages arrive 

user-specified connection status callback function invoked when new 
client-server connections are established or existing connections are 
broken 
support for multi-threading 

supports both polled and asynchronous I/O 
thread-safe 

provides automatic connections between clients and servers 

supports multiple client and server connections within the same 
application 

provides automatic connections to new clients / new servers 
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supports simultaneous many-to-many client-server connections 
no separate "naming service" or "application registration" components 
provides automatic client-server connection fault detection and recovery 

provides fault detection mechanisms based on timeouts and broken 

connections 

supports fault recovery via automatic reconnections between clients 
and servers 

provides basic support for data marshalling between machine architectures 
byte-swapping 

explicit message data type specification 

all message data sent out using network byte order 

provides basic capabilities for reading the system clock and performing time 

conversions 

allows registration of user-defined signal (interrupt) handler functions 
layered object-oriented design and implementation 
cross-platform support: 

SGI IRIX 6.3/6.4/6.5 

Sun Solaris 2.5.1/2.6/2.7/.2.8 

HP HP-UX 10.20 

Linux 2.1/2.2 

Windows NT 4.0 

Windows 95/98/2000 

Solarisx86 2.7 
C-H- language support using native and GNU compilers 

The RMComms middleware is implemented as a shareable object-oriented C-h- 
library. The library provides four primary object classes, which are detailed in Attached 
Appendix C. It will be appreciated that the applications link with this library and can then 
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instantiate client and server objects for communicating with other local or remote 
applications. It should be mentioned that the application source code must also include a set 
of header files that allow connections between client and server objects, where each server 
type is assigned a server port number. For clients and servers that want to communicate, both 
the client and the server obj ects are created specifying the same server port number. Multiple 
servers of the same type can also be created, which all use the same server port number. This 
advantageously provides the ability for many-to-many client-server connections to be 
established, as illustrated in FIG. 4. Control of which servers the clients actually connect to 
is handled on the client side; clients can specify whether they wish to establish connections 
with all servers in the distributed environment, with a particular set of servers, or v^th all 
servers running on a particular set of hosts. 

The operation of the Resoxirce Management Architecture will now be described while 
referring to Figs. 13A-13C, which illustrate various operations in the distributed 
environment. More specifically, the Resource Management Architecture of the system 
illustrated in Figs. 1 3 A includes hosts A-N, where host A provides a video source server 
application A- 1 , host B provides a video distribution application B- 1 , a contract application 
B-2, and a host load monitor B-3, and host C provides a display broker application C-1 
applying video signals to a display driver C-2. It v^U be appreciated that host D is idle and 
that the connections between the various hosts constitute the network 100*. In addition, the 
Resource Management Architecture of FIG. 13 A instantiates various functions, e.g., an 
instrumentation broker FG26', a QoS manager FG44', a resource manager FG42' and a 
program control FG50'. The instrumentation broker FG26' receives data from each of the 
applications running in the distributed environment, although only the lines of 
communication between the applications running on host B are actually depicted. From the 
discussion above, it will be appreciated that each of the applications is linked to an 
Instrumentation API. 



-55- 



NCN-83018 

Referring now to FIG. 13B, a QoS violation and its consequences is depicted. In 
particular, the Instrumentation broker FG26* provides data to the QoS manager FG44* which 
is indicative of a QoS violation. The QoS manager FG44* notifies the resource manager 
FG42' of the violation; the resource manager determines that duplicate copies of the 
applications running on host B are required and that these copies should be placed on host 
D. The resource manager FG42* transmits instructions to the Program Control function 
FG50*, which starts copies of the running applications, i.e., a video distribution application 
D-1 , a contract application D-2, and a host load monitor D-3, on host D. FIG. 1 3C illustrates 
shutdown of the application copies running on host B. It will be appreciated that this 
shutdown may be initiated responsive to the original QoS violation, another QoS violation, 
or a query fi-om the user. 

Having discussed the various functions and features of the Resource Management 
Architecture in gross, selected functions and features will now be described in detail It will 
be appreciated that the discussion of the various functions will be signaled using the 
designations established with respect to FIGS. 2 A, 2B. 

FG42 - Resource Manager Function 

As mentioned above, the Resource Manager FG42 is the primary decision-making 
component of the Resource Management functional group. It is responsible for: 

(1) responding to application and host failures by determining if and what 
recovery actions should be taken; 

(2) determining if and where to place new copies of scalable applications or 
which scalable applications should be shutdown when the QoS Managers 
indicate that scale-up or scale-down actions should be taken based on 
measured application performance: 

(3) determining where new applications should be placed when requested to do 
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SO by Program Control: and 
(4) determining which and how many applications shoxild run based on 

application system (mission) priorities. 
In order to accomplish these tasks, the Resource Manager FG42 maintains a global view of 
the state of the entire distributed environment including status information on all hosts, 
networks, and applications. In addition, the Resource Manager FG42 also calculates software 
and hardware readiness metrics and reports these readiness values for display purposes. 

The Resource Manager FG42 is an object-oriented multi-threaded application written 
in C-H-, which uses the RMComms middleware for all external communication. The 
Resource Manager FG42 conmiunicates with the various software components instantiating 
the (1) Program Control FG50, 2) Hardware Broker FG40, 3) QoS Managers FG44A - 
FG44N, 4) QoS Specification Control FG29, 5) Readiness Broker in Readiness Display 
FG66, 6) Globus Broker (not shown), and 7) RM Decision Review Displays FG68A- 
FG68N. 

It will be appreciated that the Resource Manager FG42 receives status and failure 
information about hosts and networks from the Host and Network Monitoring fimctional 
group FGl, and applications from the Program Control fimctional group FG5. This 
information includes periodic status updates as well as immediate updates when statuses 
change, e.g., when a new host is detected or an application fails. In the case of any 
application shutdown, information as to whether the applications were intentionally 
shutdown or whether the application actually failed advantageously can be provided. The 
Program Control fimction FG50 also issues requests to the Resource Manager FG42 
whenever new applications need to be dynamically allocated and whenever the Program 
Control function FG50 determines that the Resource Manager FG42 needs to assess and 
attempt to resolve inter-application dependencies (e.g., one application which needs to be 
running prior to starting up another application). 
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The Resource Manager FG42 responds to applications faults and host failures by 
determining whether the failed applications can and should be restarted and attempting to 
determine where (and if) there are hosts available that the application can run on. When a 
decision is made by the Resource Manager FG42, a message is sent to Program Control 
function FG50 specifying what application to start and where to put it. The same general 
mechanism is used when the Program Control function requests that the Resource Manager 
FG42 determine where to start new applications and/or how to resolve inter-application 
dependencies; the Resource Manager FG42 responds with orders indicating what 
applications to start and where to start them. The Resource Manager FG42 advantageously 
can send application shutdown orders to the Program Control function FG50 requesting that 
a certain running application be stopped; this can occur when the QoS Managers indicate that 
certain scalable applications have too many copies running or when application system 
priority changes (to lower priorities) occur resulting in scaling back the application system 
configuration. See Figs. 13B and 13C and the associated discussion above. 

The Resource Manager FG42 receives host load and host fitness information from 
the Hardware Broker (Host Load Analyzer) function FG40. This information includes overall 
host fitness scores, CPU-based fitness scores, network-based fitness scores, and memory and 
paging-based fitness scores along with the SPEC95 rating of the hosts. This information is 
received approximately once a second and includes information on all known hosts in the 
distributed system. These scores are used by the Resource Manager FG42 for determining 
the "best" hosts for placing new applications when: 

(1 ) responding to requests fi'om the QoS Managers to scale up additional copies 
of an application; 

(2) attempting to restart failed applications; 

(3) responding to requests to dynamically allocate certain appHcations; and 

(4) responding to application system (mission) priority changes which require 
scaling up additional applications. 
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Advantageously, the Resource Manager FG42 also receives requests from the QoS 
Managers FG44A - FG44N for scaling up, moving, or scaling down specific applications. 
The Resource Manager FG42 responds to these requests by determining whether the request 
5 should be acted upon and, if so, determines the specific action to take and issues orders to 

the Program Control function FG50 to start up or shutdown specific applications on specific 
hosts. The QoS Managers FG44A - FG44N are responsible for monitoring specific system 
performance metrics (e.g., quality of service, or QoS, requirements) via instrumentation and 
determining if performance can be improved by scaling up or moving certain applications. 

10 When this occurs, the QoS Managers send a request to the Resource Manager FG42 

indicating that a new copy of a specific application should be started. If the QoS Managers 
determine that the performance of a scalable application can be improved by moving an 
application, a scale up request is first sent to the Resource Manager FG42 and when the new 
application has been started, a scaledown request is then sent to the Resource Manager FG42. 

1 5 Moreover, when the QoS Managers FG44A -FG44N determine that there are more copies 

of scalable application running then are needed, requests to shutdown specific applications 
are sent to the Resource Manager FG42. 

It will be appreciated that the Resource Management Architecture distributes 
20 functionality between the QoS Managers FG44A-FG44N and the Resource Manager FG42. 

Thus, the QoS Managers determine what actions would potentially improve performance, 
while the Resource Manager FG42 has final authority to determine whether to implement the 
requested actions. 

25 It should be noted that when the Resource Manager FG42 is first started, it reads in 

the System Specification Files FG32 (via calls to the System Specification Library (SSL) 
FG34) which contains the list of hosts that are known to be (operating) in the distributed 
environment and information on all applications that can be run in the distributed 
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environment. The application-level information includes where specific applications can be 
run, which applications are scalable, which applications can be restarted, and any 
dependencies between applications, hi addition, the Resource Manager FG42 receives 
updated application survivability specifications fi-om the QoS Specification Control fimction. 
This information overrides the application survivability information that was initially loaded 
in fi-om the System Specification Files FG32 for the specified application. The information 
is used by the Resource Manager FG42 to determine whether the specific application will be 
restarted if it fails at run-time. 

It should also be noted that the Resource Manager FG42 sends application system and 
hardware system readiness and system (mission) priority information to the Readiness 
Broker, which is a translator withing the Readiness Display FG66 and to the Globus Broker 
(another Broker (not shown)). The Readiness Broker is responsible for driving a GUI/display 
FG66, which shows the current readiness data and allows the system (mission) priorities to 
be changed and sent back to the Resource Manager FG42. The Globus Broker provides 
basically the same fimctionality except that only a high-level subset of the readiness data 
provided to the Readiness Broker is provided to the Globus Broker. The readiness 
information sent to the Readiness Broker consists of readiness values for each application, 
application subsystem, and application system defined in the System Specification Files 
FG32. The scores advantageously can be based on the status (up/down) of the applications 
and the percentage of potential copies of scalable applications that are currently ruiming. 
Host and network readiness scores are determined based on the host loads and host fitness 
scores received fi-om the Hardware Broker FG40. 

The Resource Manager FG42 also sends information about allocation and 
reallocation decisions to the RM Decision Review Display FG68 (FIGS. 9A, 9B). 
Information on the decision that was made, what event the decision was in response to, and 
how long it took to both make the decision and implement the decision are sent to the 
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display. In addition, information about the top choices for where an application could have 
potentially been placed is also sent (if applicable); this information includes the host fitness 
scores for the selected host and other hosts which could have been selected. 

As described above, the Resource Manager function FG42 communicates with 
Program Control FG50, the Hardware Broker FG40, the QoS Managers FG44A -FG44N, 
QoS Specification Control (not shown - legacy function), the Readiness Broker of the 
Readiness Display FG66, the Globus Broker (not shown), and the RM Decision Review 
Display FG68 using the RMComms middleware. The message formats and contents of each 
message that is exchanged between the Resource Manager function FG42 and other 
functional elements of the Resource Management architecture are described in CD- Appendix 
D. The timing and/or event trigger for each message is also described. 

FG40 - Host Load Analyzer (Hardware Broker) Function 

The Hardware Broker FG40 provides the host load analysis function of the Resource 
Management functional group FG4. It is responsible primarily for determining the host and 
network loads on each host within the distributed computing environment. The Hardware 
Broker FG40 assigns a set of fitness scores for each host and periodically provides the list 
of fitness scores to the Resource Manager FG42. FIG. 14 illustrates the connectivity and 
high-level data flow between the Hardware Broker and the other Resource Management and 
Resource Management-related components. 

The Hardware Broker FG40 is an object-oriented multi-threaded application written 
in C-H-, which uses the RMComms middleware for all external communication. It receives 
operating system-level statuses and statistics for each host from the History Server(s) FGl 2A 
-FG12N. This information is used for calculating CPU, network, memory, paging activity, 
and overall fitness scores for each host. The Hardware Broker periodically (once per second) 
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sends the list of host fitness scores to the Resource Manager FG42. 

When the Hardware Broker FG40 is first started, it reads in the System Specification 
Files FG32 (via calls to System Specification Library (SSL) FG34) which contain the list of 
5 hosts that are known to be in the distributed environment. The Hardware Broker also reads 

in the file networks, dat which contains a list of information about the bandwidth and 
maximum packet sizes on known network subnets. It should be mentioned that this data is 
used for converting host network load information based on packet counts to load 
information based on bytes per second and percentage of available bandwidth. 

10 

It should be mentioned that there are two other RMConmis interfaces that the 
Hardware Broker FG40 uses. Periodically (approximately every three seconds), the Hardware 
Broker FG40 sends a list of overall and network host fitness scores to the Hardware Broker 
Instrumentation Display FG69A - FG69N. As mentioned above, these displays were 

15 constructed using the Graph Tool described in the Instrumentation Graph Tool Display. 

Additionally, the Hardware Broker FG40 can receive host-based network load data from the 
Remos Broker FG16, which receives network data via the Remos Network Monitoring 
software (denoted 2 in FIGS. 2A, 2B). If Remos network data is available for any of the hosts 
that are being monitored, the Remos data is used for the network fitness score calculation for 

20 that host rather than the host network data received from the History Server(s). 

The exemplary instance of the Hardware Broker FG40 is an object-oriented multi- 
threaded application. At the highest level, the Hardware Broker object contains the elements 
listed in Table n below. It will be noted that Table n contains a brief description of each of 
25 these objects. Additional details are provided in CD-Appendix E. 

Table II 
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No. 


Title 


Description 


1 


Host Fitness Database 
object (FitnessDB 
class) 


The Host Fitness Database object stores load history 
data and fitness score information for each host. The 
Host Fitness Database is updated and fitness scores are 
recalculated when new History Server Host Status 
Response Messages are received. For each host, a 
circular queue of host load history data (Hostlnstance 
class) is maintained with the newest data being placed 
at the end of the queue; this history data is used for 
recalculating host fitness scores. The Host Fitness 
Database also contains a System Specification Library 
(SSL) object which is used to access SPEC rating 
iiiiuniiauun ior uic nuoia. 


2 


Signal Registration 
object (SignalRegistry 
class) 


The Signal Registration object allows for a user- 
defined SIGINT signal handler to be registered in order 
to permit the Hardware Broker FG40 to be shutdown 
gracefully. 


3 


Network Subnet 
inionnaiion jjaiaoase 
object (SubnetDB 
class) 


The Network Subnet Information Database object is 
used to store EP address, maxunum bandwidth, and 
MTU size for each network specified in the 
networks.dat file. This information is used for 
converting network packet load information to 
bytes/second network load information. 
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4 


Remos Host Network 
Bandwidth Database 
object (RemosDB 
class) 


The Remos Host Network Bandwidth Database object 
stores the latest Remos-reported network bandwidth 
information for each host being monitored. The 
information stored consists of available bandwidth as 

well ao lllaAllIlUIIl puiCIlUal DanUWlULll OU a SpcCIIlC 

host network link. If Remos bandwidth information is 
available for a host and the latest data is less than 5 
seconds old, the Remos data will be used for 
calculating the network fitness score for the host. 


5 


History Server 
Interface object 
(HistServInterface 
class) 


The History Server Interface object inherits from the 
RMComms TCPCommClient class and is responsible 
for maintaining connections to the History Server(s), 
for registering status and message handler callback 

lUllL'LiUllO, l\JL oClXiiill^ lllCood^Co LU lilC lll^LUiy 

Server(s), and for invoking the status and message 
handler callback functions when connections to 
History Servers are either established or broken or new 
messages are received from a History Server. 
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6 


Instrumentation Graph 
Tool Display Interface 
object (Instrlnterface 
class) 


The Instrumentation Graph Tool Display Interface 
object inherits from the RMComms TCPCommServer 
class and is responsible for maintaining connections to 
the Graph Tool Display(s), for registering status and 
message handler callback functions, for sending 

TYi^ccQfT^c fn^ i^rck'nn T*r*ol T^i ot^l €i\/i c i qhH tym" 
IliCooagCo lU UlC VJIcipil 1 UUl l^ldpiay^b CUIU iUi 

invoking the status and message handler callback 
fimctions when connections to Graph Tool Displays 
are either established or broken or new messages are 
received from a Graph Tool Display. 


7 


Resource Manager 
Interface object 
(ResMgrlnterface 
class) 


The Resource Manager Interface object inherits from 
the RMComms TCPCommServer class and is 
responsible for maintaining connections to the 
Resource Manager for registering status and message 
handler callback functions, for sending messages to the 

rvCoVJUlC'V IVlcLild^d, ClllLI IX/L 111VU1\J.11^ UiC olxlLLlo allKX 

message handler callback fimctions when connections 
the Resource Manager are either established or broken 
or new messages are received from the Resource 
Manager. 
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functions, for sending messages to the Remos Broker, 






and for invoking the status and message handler 






callback functions when connections the Remos 






Broker are either established or broken or new 






messages are received from the Remos Broker. 



FG44: Quality-of-Service (QoS) Manager Function 

The QoS Managers FG44 A - FG44N are responsible for monitoring application-level 
performance requirements, which requirements are defined in the System Specification Files 
FG32 and are monitored primarily via instrumentation data obtained directly from the 
application code. The QoS Managers FG44A - FG44N advantageously determine if 
applications or application paths are satisfying their assigned requirements. When an 
application is not meeting its performance requirements and the application is scalable (in 
the sense that multiple copies can be run and the copies will perform load-sharing across the 
copies), the QoS Managers FG44A - FG44N will either request that the Resource Manager 
FG42 scale up a new copy of the application or move the application to a new host (which 
hopefully will result in better performance). Moreover, if there are multiple copies of a 
scalable application nmning, and all copies are performing below the specified requirement 
threshold, the QoS Managers FG44A - FG44N will request that the Resource Manager 
shutdown a specific copy. 

The QoS Manager is a single-threaded application written in C/C-H-. It should be 
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noted that the application can be scaled for both redundancy and/or load-sharing. In an 
exemplary case, each copy of the QoS Manager monitors all of the requirements associated 
with a single application path defined in the System Specification Files FG32. It will be 
appreciated that the specific path to be monitored can be specified via command-line 
parameters. By default, without specifying a path via the command-line, the QoS Manager 
will monitor all requirements for all defmed paths. 

As mentioned above, the QoS Manager advantageously uses a sliding window 
algorithm to determine when to declare that applications should be scaled up or scaled down. 
The inputs to the algorithm define both high and low sampling window sizes, the maximum 
nxmiber of allowed violations within the sampling window, and violation thresholds as a 
percentage of the actual specified requirement value. It will be appreciated that the sliding 
window algorithm was selected in an effort to damp out unexpected "noise" or "spikes" in 
the measured performance data. Use of threshold value states as a percentage of the actual 
requirement value was selected in order to scale up, or scale down, prior to violating the 
specified hard requirement. It will be understood that the success of this approach is highly 
dependent on the rate of change and noisiness of the measured data.. 

Again, the QoS Manager uses the RMComms middleware for all external 
communication. Each copy of the Resource Manager talks to (1) Resource Manager FG42, 
(2) Program Control FG50, (3) QoS Specification Control (not shown), (4) QoS Monitor 
FG29, (5) Instrumentation Correlators FG26A -FG26N, (6) Graph Tool Instrumentation 
Displays FG69A-FG69N, and (7) History Servers FGl 2A-FG1 2N. In an exemplary case, the 
QoS Managers FG44A - FG44N advantageously can receive configuration orders fi-om the 
Resource Manager FG42, which allows the Resource Manager FG42 to configure each QoS 
Manager to monitor specific application paths and also set the sliding window criteria to be 
used by each respective QoS Manager. 
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Each copy of the QoS Manager advantageously can transmit application scale up and 
scale down requests to the Resource Manager FG42 when the measured performance data 
for a respective application violates either the high (scale up) or low (scale down) sliding 
window criteria for a specific requirement. A scale up request indicates which application 
on which host has violated the performance criteria, and a scale down request indicates 
which application on which host is recommended to be shutdown. Each copy of the QoS 
Manager can also request that the Resource Manager move an application. This will occur 
in the case where one copy of an application is performing much worse than all other running 
copies. The move request is implemented as a scale up request followed by a scale down 
request (of the badly performing copy); the scale down request is not transmitted to the 
Resource Manager FG42 until the scale up action has been implemented. 

The QoS Managers FG44A - FG44N use the application "settling times" defmed in 
the System Specification Files to ensure that once a requested action has been sent to the 
Resource Manager that no additional actions are requested until after the application settling 
time has elapsed. This provides time for initialization and configuration among the 
application copies to occur. In future releases, the inter-application dependencies will be used 
instead. 

The division of responsibility between the QoS Managers FG44A - FG44N and the 
Resource Manager FG42 is as follows: 

(1) the QoS Managers FG44A - FG44N determine what actions would 
potentially improve performance; and 

(2) the Resource Manager FG42 has final authority to determine whether to 
implement the requested actions 

It should be mentioned that there is a Request Acknowledge message from the 
Resource Manager FG42 which has been defined and implemented within the QoS Manager 
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code. This message is intended to provide feedback to the QoS Manager indicating that the 
request had been successfully received and whether the Resource Manager FG42 intends to 
implement the request. 

As previously mentioned, the QoS Managers FG44A - FG44N receive application 
status and state information from the Program Control fimction FG50. Program Control 
periodically sends application status updates for all running applications and also sends 
immediate indications of any applications which have been started or stopped. This 
information is used by the QoS Managers FG44A-FG44N, along with the instrumented 
performance data being received via the QoS Monitor FG29 and Instrumentation Correlators 
FG26A-FG26N, to determine the exact state of the monitored applications Al-NM that are 
running. This information is also used to determine when (and if) requested actions have 
been implemented by the Resource Manager FG42. The information is also used for setting 
up and discarding internal data structures used for monitoring the performance of each 
application. 

The QoS Managers FG44A - FG44N also receive application-level instrumentation 
data indicating current application performance values from the Instrumentation Correlators 
FG24A-FG24N, the Instrumentation Brokers FG26A-FG26N, and/or the Jewel 
Instrumentation Broker (QoS Monitor) FG29. The instrumentation data that is received 
contains (at a minimum): 

(1 ) the timetag regarding when the data was generated; 

(2) the hostname and IP address of the host where the application that the data 
is associated with is running; 

(3) the process id (pid) of the application that the data is associated with; and 

(4) the event number of the instrumentation message. 

The event number of the instrumentation message specifies the type of instrumentation data 
that has been received and the hostname, IP address, and pid are used, in conjunction with 
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the application data received from Program Control, to determine the specific application that 
the data is associated with. 

If the contents of the instrumentation message match any of the application 
performance requirements that are currently being monitored by the QoS Manager, the data 
value is added to the proper requirement sliding window for the specified application. The 
sliding window algorithm is then checked to determine if the new sample triggered a 
violation of either the high or low sliding window. If a high threshold sliding window 
violation occurs and the application does not already have the maximum number of copies 
running, a determination is made as to whether performance can be best improved by starting 
a new application (scale up) or by moving an existing copy to a different host. The 
corresponding action recommendation will then be sent to the Resource Manager. In an 
exemplary case, the criteria for determining whether an application should be moved rather 
than scaled up is based on relative performance of the replicated applications. Thus, if one 
application is performing much worse [> 50%] than the other copies, the reconmiendation 
will be to move the application. Likewdse, if the new sample triggers a low threshold sliding 
window violation, and the application has more than the minimum number of copies running, 
a recommendation will be sent to the Resource Manager FG42 requesting that the copy of 
the application that is experiencing the worst performance be scaled dovm. 

It will be appreciated from the discussion above that when a copy of the QoS 
Manager is first started, it reads in the System Specification Files FG32 (via calls to System 
Specification Library (SSL) FG34), which contain the list of hosts that are known to be in 
the distributed environment and information on all applications that can be run in the 
distributed environment. The application-level information includes where specific 
applications can be run, which applications are scalable, which applications can be restarted, 
and any dependencies between applications. 
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It should also be mentioned that the Resource Manager FG42 receives updated 
application survivability specifications fi-om tiie QoS Specification Conti-ol component. This 
information overrides die application survivability information that was initially loaded in 
fi-om tiie System Specification Files for the specified application. The information is used by 
the Resource Manager FG42 to determine whether the specific application v^U be restarted 
if it fails at run-time. 

As described above, the QoS Managers FG44A - FG44N conmiunicates witii the 
Resource Manager FG42, Program Control FG50, tiie QoS Specification Control (not 
shown), the QoS Monitor FG29, an Instinimentation Correlator (generally denoted FG24), 
a Graph Tool InstiTimentation Display (generally denoted FG69), and the History Servers 
FG12A-FG12N using tiie RMComms middleware. The message formats and contents of 
each message that is exchanged between the QoS Managers FG44 A - FG44N and tiiese other 
ftmctional components are described in greater detail in CD-Appendix F. Additional details 
regarding the timing and/or event trigger for each message is also described in tiie Appendix. 

FG3: SYSTEM SPECIFICATION LANGUAGE & SYSTEM SPECIFICATION 
LIBRARY (SSL) FUNCTIONS 

In order to effectively manage a pool of computing resources, the Resource Manager 
FG42 requires some means or mechanism of determining the capabilities and configuration 
of the computing resources under its control, as well as the software components that need 
to be executed and the dependencies of these software components on both hardware and 
software resources. Additionally, the Resource Manager FG42 requires the capability to 
determine the expected mission-level and application-level requirements. Furthermore, tiie 
Resource Manager FG42 must be able to determine what control capabilities are available 
to be used to attempt to recover from fault or QoS violation conditions. 
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In order to address these needs, a System and Software Specification Grammar has 
been developed to capture the "static" information needed by the Resource Manager FG42 
for effectively managing a pool of distributed resources. The grammar captures the following 
information: 

• Hardware and Operating Systems 

• Hardware Configuration 

• Network Configuration 

• Operating System and Version 

• Software 

• Systems, Subsystems, Applications, Processes 

• Resource Requirements 

• QoS Requirements (Events) 

• Survivability Requirements 

• Path Information: Structure and QoS Requirements 

As part of the grammar development effort, a specification library has also been 
developed that parses the specification files and provides an API for accessing the 
specification information. It will be noted that the specification library was written in C++ 
and has been ported for all development platforms including Solaris 2.6, Solaris 2.7, Irix 6.5, 
HP-UX 10.20, Red Hat Linux 6.0, and Wmdows NT 4.0. The library advantageously can be 
used by substantially all of the Resource Management fimctional elements, including 
Program Control FG50, Resource Manager FG42, Path QoS Managers, Hardware Broker 
FG40, and History Servers FG12A-FG12N. 

As illustrated in FIG. 3, the API library consists of a yacc file FG302 that defines the 
BNF granmiar, a lex file FG304 that defines the tokens of the language, and a set of C++ 
classes FG306 that store the spec file information. The lex file FG304 is compiled with the 



-72- 



NCN-83018 

GNU tool flex FG3 1 0 and it creates a C-Hh source file FG320. The GNU tool bison FG 3 1 2 
compiles the yacc file FG302 and creates C-H- source and header files FG322 and FG324. 
It will be noted that the lex source file FG304 includes the yacc header file FG322. The C++ 
compiler FG314 then compiles these two source files to create lex and yacc objects FG330 
and FG332. The C++ compiler FG3 14 also compiles the C++ storage classes FG334. All of 
these objects are linked into a single library FG34 to be utilized by an application. FIG. 3 
illustrates this process flow. 

The Software Specifications Grammar (SSG) provides techniques for describing the 
characteristics and requirements of dynamic, path-based real-time systems as well as 
providing abstractions to describe the properties of the software, such as hierarchical 
structure, inter-connectivity relationships, and run-time execution constraints. The SSG also 
allows description of the physical structure or composition of the hardware such as LANs, 
hosts, interconnecting devices or ICs (such as bridges, hubs, and routers), and their statically 
known properties (e.g., peak capacities). Furthermore, the Quality-of-Service (QoS) 
requirements on various system components advantageously can be described. 

At the highest level, a specification consists of a collection of software systems, 
hardware systems, and network systems. The language rules for specifying systems are 
described generally below and in detail in CD-Appendix G. The system specification 
language hierarchy is shown below; selected details will be presented immediately following. 

• Software Specifications 
• Application 

• Security 

• Configuration 

• Hardware Requirements 

• Startup Info 



-73- 



NCN-83018 

• Dynamic Arguments 

• Shutdown Info 

• States 

• Dependencies 

• Initial Load Estimate 

• QoS Info 

• Survivability 

• Scalability 

• Hardware Specifications 

• Host Info 

• Netw^ork Info 

•LANs 

• Network Devices (Interconnects) 

• Path Specifications 

• Data Flow Graph 

• Data Flow Info 

• QoS Requirements 

It will be appreciated that a software specification is a collection of software systems, 
each of which consists of one or more software subsystems. Specification files are provided 
by the developer to capture as much knowledge about their software system as possible. 
These files provide a model of the actual systems which can be used by the Resource 
Manager FG42 at run-time. 

In contrast, an application is an executable program that can be started as an 
autonomous process on a host. Application attributes include all information necessary to 
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Startup and shutdown the application. Associated startup block and the shutdown blocks 
describe how to start and stop the application and includes information such as the directory 
and name of the application, command line options, and environment variable settings. 

5 An application instantiates an SSL object by calling its constructor. This parses the 

spec files in the specified directory and populates the object hierarchy to provide the data to 
the application. The SSL class contains an SSL_Container member, that holds the spec file 
data in its lists and maps. All the systems firom the spec files are contained in the appropriate 
list, software systems in the swSysList, hardware systems in hwSysList, and network systems 

1 0 m nwSysList. The pathList contains all the paths in the spec files. The hostList contains all 

the hosts in the spec files; this list is also available fi-om the entries in hwSysList. The 
processList contains a list of processes fi-om the CONFIGURATION block. Moreover, it 
should be noted that one or more configuration blocks can exist per application. For example, 
an application that runs on more than one platform would have multiple CONFIGURATION 

1 5 blocks with different platforms in each HARDWARE block. 

The application startup block contains all the information necessary to, automatically 
or manually, start an application. This information includes supported hardware (host) type, 
operating-system type, and operating-system version(s). This may be fiirther constrained by 
20 an optional list of the names of hosts that can run the application. The startup information 

also includes the working directory for reading and writing data files, the name of the 
executable, and an ordered list of arguments that must be passed on the command line when 
the application is started. Last is a list of processes expected to be seen on the system when 
the application is running. 

25 

An application shutdown block indicates the command(s) to be used for termination 
of the application. A shutdown command may be a POSDC signal name or may be a shell 
script or batch file. Supported signals include SIGKILL, SIGQUIT, SIGHUP, SIGUSRl, 
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SIGUSR2, SIGSTOP, SIGINT, and SIGTERM. The ShutdownTime parameter is the 
maximum time to wait for the an application to die gracefully before forcing the application 
to terminate via the SIGKILL signal. 

Other blocks are available. For example, a dependency block indicates any 
dependencies the application may have with the startup and/or shutdown of other 
applications (e.g., it may be required that a particular application be started before another 
application can be started). It will be noted that the dependency block is used by both 
Application Control FG50 and the Resource Manager FG42 to determine whether or not it 
is safe to start an application, stop an application, or let an application continue to run. 

The scalability specification for an application indicates whether an application can 
be scaled via replication. Scalable applications are programmed to exploit load sharing 
among replicas, and can adapt dynamically to varying numbers of replicas. The specification 
also mdicates whether an application combines its input stream (which may be received from 
different predecessor applications and/or devices), and splits its output stream (which may 
be distributed to different successor applications and/or devices) are also specified. 
"Combining" and "splittmg" are commonly called "forking" and "joining" in parallel 
computing paradigms. 

Specification files advantageously can be provided to describe a given set of networks 
that exist in a distributed runtime environment. A network system specification describes the 
LANs and ICs (mterconnection devices such as switches, hubs and routers). A system 
consists of one or more subsystems. A subsystem may contain LANs (each with an 
associated peak bandwidth specification) and ICs (each containing a description of network 
membership). 

Advantageously, a real-time QoS requirement specification includes timing 
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constraints such as simple deadlines, inter-processing times, and throughputs. A simple 
deadline is defined as the maximum end-to-end path latency during a cycle fi*om the 
beginning to the end of the path. Inter-processing time is defined as a maximum allowable 
time between processing of a particular element in the path. The throughput requirement is 
defined as the minimum mmiber of data items that the path must process during a unit period 
of time. Each timing constraint specification may also include items that relate to the 
dynamic monitoring of the constraint. These include minimum and maximum slack values 
(that must be maintained at run-time), the size of a moving window of measured samples that 
should be observed, and the maximum tolerable number of violations (within the window). 

CD-Appendix G described a specification grammar for declaring requirements on 
applications in a dynamic, distributed, heterogeneous resource pool. The grammar allows the 
description of environment-dependent application features, which allows for the modeling 
and dynamic resource management of such systems. 

/ 

A conraion API was developed to allow Resource Management functions access to 
the information contained in the spec files. This is an object oriented API is, in an exemplary 
case, written in C++, v^th libraries ported to all supported platforms. The object is populated 
by parsing the spec files using the BNF grammar defined by lex and yacc syntax and 
compiled v^th GNU tools flex and bison, as discussed above. Actual population occurs in 
the semantic actions of the yacc file. 

The SSL_System class is a generic class that can hold data for a software system, 
hardware system, or network system. The type member describes the type of system it 
contains. It also contains a pointer to its parent (it allows for nested systems of the same 
type), and a name of the system. The sysList contains its SSLSystem children, and compList 
contains a list of the system's components (a list of hosts, for a hardware system for 
example). 
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Preferably, the Application Program Interface (API) for the System Specification 
Library (SSL) FG34 uses the C-h- Standard Template Library for data structures such as 
linked lists and hash tables (maps). An application first instantiates the SSL object by calling 
its constructor with the name of the directory where the specification files reside. This object 
contains functions that allow setting this directory after calling its constructor 
(setSpecDir(directory name)), clearing the object of all currently held data (clearQ), parsing 
a specific file (parseSpec(filename)), and rebuilding the object (rebuildQ, implicitly clears 
the object first). Once instantiated, this object provides access to the data in the specification 
files. CD-Appendix G provides additional discussion regarding this aspect of the SSL. It will 
be appreciated that the SSL object provides methods that return all the data it contains. For 
example, the getS WSystems retums an STL list of all the software systems specified in the 
specification files. Each entry in this list provides its data by methods such as getSysName(), 
and the set of application components (ApplicationSpec) that make up the system. All data 
can be retrieved in this manner. 

FGl: HOST AND NETWORK MONITORING FUNCTIONAL GROUP 

As mentioned above, extensive monitoring capabilities are provided in the Resource 
Management architecture at the host and network levels. The information monitored includes 
statuses, configuration information, performance metrics, and detected fault conditions. 
Moreover, the Host and Network functional group FGl consists of foxir components 
including: 

1) Host Monitors FGIOA-FGION, that reside on each machine in the 
distributed environment and collect extensive operating system-level data for 
each host (CPU and memory usage, etc) and provides it to the History Servers 
via the RMComms TCPCommServer middleware. 

2) History Servers FG12A-FG12N that collect data fi-om the Host Monitors, 
maintain status and performance histories on each host in the distributed 
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environment via an RMComms TCPConimClient, and provide this 
information to displays and other Resource Management components using 
an RMComms TCPCommServer. 

3) A Host Discovery function FG14 that uses SNMP (Simple Network 
Management Protocol) calls and ping ICMP calls to determine when new 
hosts come on-line and if existing hosts go down and providing this 
information to Program Control via an RMComms TCPCommServer. 

4) A Remos Network Data Broker FG 1 6 that collects information on network 
link bandwidths from Carnegie Mellon University's SNMP-based Remos tool 
and passes this information by way of an RMConuns TCPCommServer to the 
Host Load Analyzer component of the Resource Allocation Decision-Making 
subsystem. 

It will be appreciated that Network information is collected by both the Remos broker 
FG16 and indirectly via the Host Monitors FGIOA-FGION. See FIGS. 2A, 2B. The Remos 
Broker FG16 accesses the Remos network information via the Remos API. As mentioned 
previously, Remos uses SNMP calls to the LAN sv^tches and hosts. The Host Discovery 
function FG14 uses both SNMP and ICMP (ping) calls to each host A-N to determine if a 
new host(s) has (have) come on-line or previously discovered hosts have gone down. The 
Host Monitors FGIOA-FGION employ Operating System calls to gather host and network 
performance statistics. Internally, the History Servers FG12A-FG12N collect data from the 
Host Monitors FG10A-FG21 ON. The Monitoring ftinctional group provides its information 
to the rest of the Resource Management components using RMConuns TCPCommServer 
objects, which are discussed in detail elsewhere. The Remos Broker FG16 sends data to the 
Host Load Analyzer FG40, the History Servers FG12A-FG12N send data to the Display 
functional group FG6 and Host Load Analyzer FG40, and the Host Discovery function FG 1 4 
provides Program Control FG50 with information on detected or faulted hosts. Additional 
details on these functional elements are provided immediately below. 
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FGIOA-FGION Host Monitors 

For monitoring the status and performance of hosts, a Host Monitor process runs on 
each machine within the distributed environment. These Host Monitors FGl OA-FGl ON use 
operating system-level mechanisms to retrieve status, configuration, and performance 
information of each host A-N. The information retrieved includes 1) operating system 
version and machine configuration, 2) CPU configuration, status, and utilization, 3) memory 
configuration and usage, 4) network configuration, status, and utilization, 5) filesystem 
configuration, status, and utilization, and 6) process statuses including CPU, memory, 
network, and filesystem utilization for each process. While the Host Monitors are primarily 
responsible for monitoring the status of a particular host, they also provide information on 
network load as seen by a particular host. In the same manner, the Host Monitors FGl OA- 
FGl ON also provide information and statistics concerning any remotely mounted filesystems 
(e.g., NFS). 

Preferably, the information the Host Monitors FGIOA-FGION collect is formatted 
into operating system-independent message formats. These message formats attempt to 
provide a pseudo-standardized set of state, status, and performance information which is 
useful to other components of the Resource Management architecture and such that other 
components do not have to be aware of or deal with the minor deltas between data formats 
and semantics. Since not all the state and performance data is available on every platform, 
to indicate which information is available, a group of flags are set in the host configuration 
message indicating whether specific data items are valid on a particular platform. 

It will be appreciated that the Host Monitors FGIOA-FGION have a very specific 
interface with the History Servers FG12A-FG12N. It periodically (once a second) sends its 
data to all History Servers connected to it (this is transparent, a property of the RMConmis 
TCPCommServer); the History Server makes no requests to the Host Monitors. 
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More specifically, the Host Monitors FGIOA-FGION have been designed and 
implemented in C-H-. This decision allows for a completely modular design in which 
platform-specific code can be restricted to a small number of modules. This approach 
alleviates any of the problems associated with porting to various platforms. Currently there 
is support for Sun SPARC based architectures running Solaris 2.6 and 2.7, Silicon Graphics 
MIPS based architectures running IRIX 6.5., Hewlett Packard PA-RISC based architectures 
running HP 1 020, and Pentium based architecture running both WinNT 4.0 Workstation and 
Red Hat Linux 6.0. The Host Monitor source compiles under the native compilers provided 
by Sun Microsystems and Silicon Graphics for their respective platforms. The Gnu C-H- 
compiler (version 2.8.1) may also be used on Hewlett Packard PA-RISC based architectures 
under HP-UX 10.20 and Red Hat Linux. Microsoft Visual C-H- compiles the Windows NT 
Host Monitor. All Host Monitors utilize the I/O library package supported by the Resource 
Management (RM) group under the NSWC's High Performance Distributed Computing 
(HiperD) initiative. 

The Host Monitors FGl OA-FGl ON accumulate data on a periodic interval specified 
at invocation. System process table data is accumulated and then filtered to eliminate 
"xminteresting" processes (usually meaning processes belonging to user ID 0 or 1). It is 
important to note that system-wide data is accumulated and processed before the filtering 
stage, so as to insure a complete picture of system- wide performance. This system-wide data, 
along v^th the filtered process list, is then made available to the I/O module for subsequent 
transmission to client applications. 

FG12A-FG12N: History Servers 

The History Servers FG 1 2 A-FG 1 2N are responsible for collecting information from 
the Host Monitors and maintaining histories on the statuses, statistics, and performance of 
each host in the distributed environment. This information can be requested by other 
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Resource Management functional group. Currently, the primary consumers of the status 
information are the Host Load Analyzer (Hardware Broker) FG40 component of the 
Resource Allocation Decision-Making functional group FG4, the Host Display(s) FG62A- 
FG62N, and the Path Display FG64. The Host Load Analyzer FG40 receives information on 
host configuration and loads (primarily CPU, memory, and network data) and uses this to 
assign host fitness scores. The Host Displays FG62 A-FG62N receive and display current host 
status information, process status information, and network connectivity information. It 
should be mentioned that the Host Display can also request that the History Servers provide 
CPU load information, network load information, paging activity data, and memory 
utilization information which is used to drive line graph charts for specific hosts selected at 
the Host Display. 

The History Servers FG12A-FG12N are designed so that multiple copies can be run 
simultaneously. Each History Server can be configured to either monitor all Host Monitors 
FGIOA-FGION or to monitor only a selected subset of the Host Monitors. It vdW be noted 
that the History Servers FG12A-FG12N determine the list of hosts in the distributed 
environment that could potentially be monitored from the System Specification Library 
(SSL), hi this manner, the History Servers FG12A-FG12N can be used to provide 
survivability (by having multiple History Servers FG12A-FG12N connected to each Host 
Monitor) and/or to perform load-sharing (with the History Servers FG12A-FG12N each 
monitoring only a subset of the Host Monitors). The History Servers FGl 2A-FG1 2N can also 
be configured to periodically record history data to disk. These disk files can then be used 
for off-line analysis. 

The History Server function of Resource Management acts as a data broker between 
daemons monitoring individual hosts, known as host monitors FGIOA-FGION, and other 
functional components of Resource Management. The host monitors collect performance 
information (such as CPU utilization and process status data) from hosts of various platforms 
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(SGI, SUN, HP, Windows NT, and Linux). The host monitors use a RMComms 
TCPCommServer object to distribute this data. For further information, refer to the host 
monitor and RMComms documentation. The History Server s FG12A-FG12N collect and 
store this data from the host monitors FGIOA-FGION and distribute it to other Resource 
Management Clients, such as the Host Displays FG62A-FG62N, Graph Display FG69A- 
FG69N, Path Display FG64, and the Hardware Broker. FG40 

Each History Server has two modes of operation relating to fault tolerance, 
scalability, and workload distribution between multiple instances of History Servers. The 
first mode determines at initialization (through command line arguments or default) the set 
of hosts to monitor, and this set remains static for the life of the History Server process. The 
second mode recognizes the existence of other History Server processes and coordinates 
between them. It allows for dynamic changing of the set of hosts each History Server 
monitors (example: two History Servers each monitoring half of the hosts, a third History 
Server starts, and all three History Servers reconfigure to each monitor one third of the 
hosts.) This also allows History Servers to preserve the data it collected by sending it to the 
others, providing fault tolerance. 

The History Server fimction is written in C-H- with an object-oriented design. The 
main routine processes the command line arguments, retrieves the list of hosts to monitor 
using an SSL object, instantiates the main History_Server object, and spawns the Collector, 
Distributor, Communicator, and Display thread. These threads share the main History_Server 
object. The Collector thread is responsible for collecting and storing data from the host 
monitors. The Distributor thread processes requests from RM Clients. The Conunimicator 
thread waits for events v^th other History Servers and takes appropriate actions, including 
triggering the Display thread to update the History Server Display. 

FG14 - Host Discovery 
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The Host Discovery function FG14 advantageously can use a Perl script that makes 
SNMP (Simple Network Management Protocol) calls and ICMP ping calls. These calls are 
used to periodically scan each subnet and host address in the distributed environment to 
attempt to determine whether there have been any host status changes, hi an exemplary case, 
5 the list of hosts and subnets that are to be monitored is read in from a file. 

The host discovery FG14 issues MIB-II SNMP queries to obtain information on the 
hosts A-N on the network. When a new host is first detected, the new host's operating system 
configuration is queried via SNMP calls. Information on the newly discovered host and its 
1 0 operating system configuration is then sent to the Program Control fimction FG50. Likewise, 

when a host fails to respond to multiple SNMP and ping queries, a message indicating that 
the host appears to have gone down is sent to the Program Control function. 

The Host Discovery function FG14 interfaces with Program Control FG50 using a 
15 C-H- wrapper class around the Perl script. This wrapper class contains an RMComms 

TCPCommServer, making the data collected by the SNMP calls available to the rest of the 
Resource Management components. 

FG16 - Remos Network Data Broker 

20 

The final functional component of the Host and Network Monitoring functional 
group is the Remos Network Data Broker FG 1 6 which receives information on network link 
bandwidth and network link bandwidth utilization from the SNMP-based Remos network 
monitoring tool, as shown m FIGS. 2A, 2B and/or FIG. 14. The network information is 
25 accessed via the Remos API library and is then sent on to the Host Load Analyzer (Hardware 

Broker) function FG40 of the Resource Allocation Decision-Making functional group FG4 
using an RMComms TCPConunServer. Remos works by using SNMP to query the switches 
(via the bridge collector) to collect information on network configuration as well as 
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bandwidth utilization on each link and also issues SNMP MIB-II queries to each host to 
collect the host' s view of network utilization. The network information received from Remos 
consists of the maximum potential bandwidth and the current bandwidth utilization on 
specific host network links. 

The Remos Broker FGl 6 provides the following information about the network link 
for each host. The data is sent to the Host Load Analyzer (Hardware Broker) approximately 
every 2 seconds. The Remos Broker FGl 6 uses configuration files listing specific hosts and 
switches that should be queried. 

The functions implemented by Host Monitor functional group FGl have been 
designed to provide a system monitoring capability not normally supplied by standard S VR4 
or BSD Unix services. Such services include cross-platform reporting of system process 
loading, CPU performance, network performance and periodic status summary reporting. The 
Host Monitors were developed to support efforts by the HiperD Resource Management 
group, attempting to provide a common set of OS level parameters useful for assessing host 
and network load and status, for supporting resource allocation/reallocation algorithms, and 
attempting to provide a minimally intrusive, close to real-time capability for gathering this 
data. 

Host Discovery Design 

The Host Discovery fimction FGl 4 of the Resource Management architecture 
provides resource discovery of hosts on a network. It identifies new hosts that come online 
or previously known hosts that have gone offline. The Host Discovery component can 
determine the hostname, the operating system name and version, and in some cases the 
machine architecture and manufacturer of a newly discovered host. This information is sent 
to Program Control so the new host can be added to the pool of resources. 
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The Host Discovery functional element FG14 consists of a Perl script that contains 
the resource discovery functionality, and a C-H- object that receives the output of the Perl 
script and provides this information to Program Control via an RMComms TCPCommServer 
connection. This is described in CD-Appendix H. More specifically, the Perl script 
host_discovery,pl issues ICMP (ping) calls and MIB-II SNMP queries to discover new hosts. 
On initialization, the script populates a data structure called Netjnfo for each of the 
networks (subnets) it needs to monitor. Currently this information is hard-coded, the subnet 
is defined as 1 72.30. 1 , and the lower and upper limits for the host are 1 and 254 respectively. 
It then initializes the global variables for the server host and port, network domain, and the 
executable path for the ping (^ing) command 

The host_discovery.pl script establishes a baseline of existing hosts using the current 
set of hosts that answer the ^ing call. For each network/subnet defined in its list of Net info 
(Net_info.pm) data structures, it calls ^ing and builds a list of IP addresses of hosts that 
answered the ping, known as reachable hosts, and a list for those hosts that did not answer 
the ping. For each reachable host, a Hostjnfo (Host_info.pm) data structure is populated to 
store the host's information. (Key fields in the Hostjnfo data structure include IP address, 
hostname, operating system and version, architecture class, and manufacturer.) Since the IP 
address of the reachable host is known, a call to gethostbyaddrQ is used to get the hostname. 
Other information for the host is obtained by making a MIB-II (Management Interface Base 
version 2) system Group (Object ID 1 .3 .6. 1 .2. 1 . 1 . 1 .0) SNMP call to the SNMP agent on each 
reachable host. This SNMP query retums information on the configuration of a specific 
network device (in this case, the configuration of each reachable host). 

The host_discovery,pl script makes SNMP calls by using subroutines freely available 
for public use (freeware), created by Simon Leinen. These subroutines are contained in the 
files BER.pm and SNMP Session.pm. The SNMP Session is configurable for specifying 
timeouts and number of retries before declaring a host imavailable, and for specifying the 
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SNMP Object Id (OID). 

Additional general and specific details regarding functional elements of the Host and 
Networking functional group FGl are provided in CD-Appendix H. 

FG2: Instrumentation functional group 

As mentioned above, the NSWC-DD Instrumentation System provides general- 
purpose application event reporting and event correlation capabilities. The Instrumentation 
system forms an architecture that allows instrumented application data to be easily accessible 
by other components of the Resource Management architecture. The major functional 
components of the Instrumentation System architecture are the following: 

1 ) The Instrumentation API Libraries, which are linked with the applications 
and provide the function call interfaces by which the application sends 
instrumentation data. 

2) An Instrumentation Daemon, one copy of which resides on each host in the 
distributed environment and is responsible for reading instrumentation data 
sent by the applications, reformatting the data into instrumentation event 
messages and sending the messages to the Instrumentation Collectors. 

3) The Instrumentation Collectors, which connect to the Instrumentation 
Daemons on each host and receive instrumentation messages from all hosts. 
The Collectors forward received messages to the Instrumentation Correlators 
and Instrumentation Brokers. 

4) The Instrumentation Correlators, which receive instrumentation messages 
from the Instrumentation Collectors and provide grammar-driven capabilities 
for correlating, combining, and reformatting application data into higher-level 
metrics (composite events) for use by displays or other Resource 
Management components. 
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5) The Instrumentation Brokers, which receive instrumentation messages 
from the Instrumentation Collectors and perform task-specific reformatting 
and data manipulation for driving displays or other Resource Management 
components. 

6) The Jewel Instrumentation Broker (QoS Monitor), which is a legacy 
component that can receive instrumentation data from either the open source 
Jewel instrumentation package or from the Instrumentation Collectors. The 
QoS Monitor performs task-specific message reformatting and data 
manipulation for driving displays and the QoS Managers. 

Instrumentation API Library 

The applications link in the Instrumentation API Library and make API call to 
construct and send out instrumentation event messages. Three separate APIs are provided for 
use by the applications: 1) a printfO-style API which allows the code to format, build, and 
send instrumentation data Mdth a single ftinction call, 2) a bufifer-construction-style API 
where the multiple function calls are made to construct the instrumentation buffer iteratively, 
one data element per call, and 3) a Jewel ftinction call API based on the existing API 
provided by the Jewel instrumentation package (an open-source package produced by the 
German National Research Center for Computer Science). The first two APIs are the 
preferred progranmiing interfaces and take advantage of several key new instrumentation 
features. It will be appreciated that the Jewel API is provided solely for backwards 
compatibility v^th existing instrumented application code and is implemented as a set of 
wrappers around the printfQ-style API. All three APIs are supported for C and C-H-. Ada 
bindings have been produced for the buffer-construction-style API and the Jewel function 
call API. 

The instrumented data is sent fi-om the application to the Instrumentation Daemon on 
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the same host. The current mechanism for data transfer is via UNIX FIFO IPC (inter-process 
commimication) mechanisms. The FIFO mechanism was chosen based on reliabiHty, low 
overhead, and ease of implementation. Future implementations of the Instrumentation system 
may explore alternate data passing mechanisms including shared message queues. 

Instrumentation Daemon 

An Instrumentation Daemon resides on each host in the distributed environment. The 
Instrumentation Daemon is interrupted when new data is written to the FIFO. The 
Instrumentation Daemon reads the data from the FIFO and reformats the data into the 
standard internal Instrumentation message format and sends the data to each of the 
Instrumentation Collectors that are currently active. (For future implementations, an event 
request filtering mechanism v^U be implemented so that specific event messages will only 
be sent to those Instrumentation Collectors that have requested the message.) 

Instrumentation Collectors 

The Instrumentation Collectors receive instrumentation messages from the 
Instrumentation Daemons on each host in the distributed environment. Currently, the 
Instrumentation Collectors send every instrumentation message to all Instrumentation 
Brokers and Instrumentation Correlators that have connected to the Instrumentation 
Collector. (For future implementations, an event request filtering mechanism will be 
implemented so that specific event messages will only be sent to those Instrumentation 
Brokers and Instrumentation Correlators that have requested the message. For now, the 
Instrumentation Collector serves as a pass-through server for instrumentation messages. The 
Instrumentation Collector does supports architecture scalability in the sense that without the 
Instrumentation Collectors, each Instrumentation Broker and Instrumentation Correlators 
would need to maintain connections to the Instrumentation Daemons on every host.) 
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Instrumentation Correlators 

The Instrumentation Correlators provide grammar-driven capabilities for correlating, 
combining, and reformatting application data into higher-level metrics (composite events) 
for use by displays or other Resource Management components. Each Correlator reads in a 
user-specified correlation grammar file that is interpreted at run-time by the Correlator's 
instrumentation correlation engine. 

Instrumentation Brokers 

The Instrumentation Brokers are task-specific applications built around a common 
code package. The Instrumentation Brokers receive instrumentation messages fi*om the 
Instrumentation Collectors, filter all received instrumentation messages to fmd the messages 
of interest, and perform task-specific message data reformatting and manipulation for driving 
other components such as displays or other Resource Management components. The 
Instrumentation Broker approach allows for instrumentation data sources to be quickly 
integrated for test, display, and debugging purposes. (As the Instrumentation Correlator 
granmiar and correlation engine mature in fiiture releases, it is anticipated that the 
Instrumentation Broker approach will be used less frequently.) 

Jewel Instrumentation Broker (QoS Monitor) 

The Jewel Instrumentation Broker (hereafter referred to the QoS Monitor) is a legacy 
architecture component that served as a broker between the Jewel instrumentation package 
components and Resource Management components and displays. The QoS Monitor was 
responsible for polling the Jewel Collector components to retrieve application event 
messages. These messages were then reformatted and used to drive several displays and the 
QoS Managers. The Jewel instrumentation package has now been replaced in all 
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applications, however the message reformatting capabilities of the QoS Monitor have been 
maintained so that several displays and the existing QoS Manager interface do not have to 
be upgraded immediately. The QoS Monitor component has been modified so that it receives 
instrumentation data fi-om both Jewel and the Instrumentation Collectors. 

Middleware 

The RMComms middleware package, which is described in the RMComms 
Middleware Design Report, provides the internal message passing interfaces between the 
Resource Management components connected via the network. The middleware provides for 
automatic location-transparent many-to-many client-server connections. Low overhead, 
reliable message passing capabilities are provided. Registration of message handler callback 
functions for specified requested message types are provided with the message handler 
functions being invoked when messages arrive. Registration of connection status callback 
functions which are invoked when either new connections are made or existing connections 
are broken is also provided. The middleware package also allows for multiple client and 
server objects to be instantiated in the same application, is thread-safe, and provides an easy- 
to-use object-oriented API through which all capabilities are accessed. 

Additional details regarding the Instrumentation functional group FG2 are provided 
in CD-Appendix 1. 

FG42: Resource Manager 

The Resource Manager 42 is the primary decision-making component of the 
Resource Management toolkit. It is responsible for: 1) responding to application and host 
failures by determining if and what recovery actions should be taken, 2) determining if and 
where to place new copies of scalable applications or which scalable applications should be 
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shutdown when the QoS Managers FG44A-FG44N indicate that scale-up or scale-down 
actions should be taken based on measured application performance, 3) determining where 
new applications should be placed when requested to do so by Program Control, and 4) 
determining which and how many applications should run based on application system 
(mission) priorities. In order to accomplish these tasks, the Resource Manager 42 maintains 
a global view of the state of the entire distributed environment including status information 
on all hosts, networks, and applications. In addition, the Resource Manager 42 also calculates 
software and hardware readiness metrics and reports these readiness values for display 
purposes. FIGS. lA, IB show the connectivity and high-level data flow between the 
Resource Manager 42 and the other Resource Management-related components. 

The Resource Manager 42 receives status and failure information about hosts, 
networks, and applications from Program Control. This information includes periodic status 
updates as well as inraiediate updates when statuses change such as a new host being 
detected or an application failing. In the case of applications going down, information as to 
whether the applications were shutdown on purpose or whether they failed is also sent. 
Program Control also issues requests to the Resource Manager 42 when new applications 
need to be dynamically allocated and when Program Control determines that the Resource 
Manager 42 needs to assess and attempt to resolve inter-application dependencies (such as 
an application which needs to be running prior to starting up another application). 

The Resource Manager 42 responds to faulted applications and hosts by determining 
whether the failed applications can and should be restarted and attempting to determine 
where (and if) there are hosts available that the application can run on. When a decision is 
made by the Resource Manager 42, a message is sent to Program Control specifying what 
application to start and where to put it. The same general mechanism is used when Program 
Control requests that the Resource Manager 42 determine where to start new applications 
and/or how to resolve inter-application dependencies; the Resource Manager 42 responds 
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with orders indicating what applications to start and where to start them. The Resource 
Manager 42 also sends application shutdown orders to Program Control requesting that 
certain application be stopped; this can occur when the QoS Managers FG44A-FG44N 
indicate that certain scalable applications have too many copies running or when application 
system priority changes (to lower priorities) occur resulting in scaling back the application 
system configuration. 

The Resource Manager 42 receives host load and host fitness information on all 
known hosts from the Hardware Broker 40 (Host Load Analyzer). This information include 
overall host fitness scores, CPU-based fitness scores, network-based fitness scores, and 
memory and paging-based fitness scores along with the SPEC95 ratings of the hosts. This 
information is received approximately once a second and includes information on all known 
hosts in the distributed system. These scores are used by the Resource Manager 42 for 
determining the "best" hosts for placing new applications when: 1) responding to requests 
from the QoS Managers FG44A-FG44N to scale up additional copies of an application, 2) 
attempting to restart failed applications, 3) responding to requests to dynamically allocate 
certain applications, and 4) responding to application system (mission) priority changes 
which require scaling up additional applications. 

The Resource Manager 42 receives requests from the QoS Managers FG44A-FG44N 
for scaling up, moving, or scaling down specific applications. The Resource Manager FG42 
responds to these requests by deteraiining whether the request should be acted upon and, if 
so, determines the specific action to take and issues orders to Program Control to start up or 
shutdown specific applications on specific hosts. The QoS Managers FG44A-FG44N are 
responsible for monitoring specific system performance metrics (e.g., quality of service, or 
QoS, requirements) via instrumentation and determining if performance can be improved by 
scaling up or moving certain applications. When this occurs, the QoS Managers FG44A- 
FG44N send a request to the Resource Manager FG42 indicating that a new copy of a 



-93- 



NCN-83018 

specific application should be started. If the QoS Managers FG44A-FG44N determine that 
the performance of a scalable application can be improved by moving an application, a scale 
up request is first sent to the Resource Manager FG42 and when the new application has 
been started, a scaledown request is then sent to the Resource Manager. Also, when the QoS 
5 Managers FG44A-FG44N determine that there are more copies of scalable application 

running then are needed, requests to shutdown specific applications are sent to the Resource 
Manager FG42. The division of responsibility is that the QoS Managers FG44A-FG44N 
determine what actions would potentially improve performance, but the Resource Manager 
FG42 has final authority to determine whether to implement the requested actions. 

10 

When the Resource Manager FG42 is first started, it reads in the System 
Specification Files (via System Specification Library, SSL, calls) which contain the list of 
hosts that are known to be in the distributed environment and information on all applications 
that can be run in the distributed environment. The System Specification Files also include 
15 application-level information including where specific applications can be run, which 

applications are scalable, which applications can be restarted, and any dependencies between 
applications. 

The Resource Manager FG42 can also receive updated application sxirvivability 
20 specifications fi-om the QoS Specification Control component. This information overrides 

the application survivability information that was initially loaded in fi-om the System 
Specification Files for specified applications. The information is used by the Resource 
Manager FG42 to determine whether the specific applications will be restarted if they fail at 
run-time. 

25 

The Resource Manager FG42 sends application system and hardware system 
readiness and system (mission) priority information to the Readiness Broker and to the 
Globus Broker. The Readiness Broker is responsible for driving a GUI/display which shows 
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the current readiness data and allows the system (mission) priorities to be changed and sent 
back to the Resource Manager FG42. The Globus Broker provides basically the same 
functionality except that only a high-level subset of the readiness data provided to the 
Readiness Broker is provided to the Globus Broker. The readiness information sent to the 
Readiness Broker consists of readiness values for each application, application subsystem, 
and application system defined in the System Specification Files. The readiness scores are 
currently based on the status (up/down) of the applications within a system or subsystem 
along with the percentage of potential copies of scalable applications that are currently 
running. Host and network readiness scores are also calculated and are the scores are 
determined based on the host load information and host fitness scores received from the 
Hardware Broker 40. 

The Resource Manager FG42 also sends information about allocation and 
reallocation decisions to the Resource Management Decision Review Display, hiformation 
on the decision that was made, what event the decision was in response to, and how long it 
took to both make the decision and implement the decision are sent to the display. In 
addition, information about the top choices for where an application could have potentially 
been placed is also sent (if applicable); this information includes the host fitness scores for 
the selected host and the next best host choices which could have been selected. 

See CD-Appendix M for additional details regarding Resource Manager FG42. 

In the Background Section of the application, the reader may have interpreted the 
sentence "The present invention relates generally to resource management systems by which 
networked computers cooperate in performing at least one task too complex for a single 
computer to perform" to indicate that the Resource Management Architecture is limited to 
such applications. Thus, while the Resource Management Architecture generally supports 
tasks distributed across multiple hosts, it is not limited to only those tasks that must be 
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distributed due to the inability to run them on a single machine. Moreover, the Resource 
Management functional elements advantageously could be used to control a set of 
applications which all run on the same machine v^hile still providing monitoring, fault 
tolerance, etc. (albeit that this is not the normal or even the intended configuration). 
Furthermore, the Resource Management Architecture, as discussed above, deals v^th 
resource managed applications, where the managed characteristic may be one of scalability, 
survivability, fault tolerance or priority. 

FIG. 1 5 is a block diagram of a CPU-based system 400, corresponding to one or more 
of the hosts A-N. The system 400 includes a central processing unit (CPU) 402, e.g., a 
microprocessor, that communicates with the RAM 4 1 2 and an I/O device 408 over a bus 420. 
It must be noted that the bus 420 may be a series of buses and bridges commonly used in a 
processor-based system, but for convenience purposes only, the bus 420 has been illustrated 
as a single bus. A second I/O device 410 is provided in an exemplary case. The 
processor-based system 400 also includes a primary memory 4 1 2, an additional memory 414, 
which could be either a read-only memory (ROM) or another memory device, e.g., a hard 
drive or the like. The CPU- based system may include peripheral devices such as a floppy 
disk drive 404, a compact disk (CD) ROM drive 406, a display (not shown), a key board (not 
shovra), and a mouse (also not shown), that commimicate with the CPU 402 over the bus 420 
as is well known in the art. It will be appreciated that the either one of the memories 412 or 
414 advantageously can be employed to store computer readable instructions for converting 
the general pxirpose system 400 into one of the host A-N. It will also be appreciated that the 
nature of the distributed environment permits the necessary application and API's needed to 
implement the Resource Management Architecture to be stored anywhere on the network. 
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DESCRIPTION 
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Attached 


Resource Management Architecture Function List 
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Attached 


Standard Instrumentation Message Format 
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Attached 


API Listing for RMComms 
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CD 


Resource Manager Interface Messages 
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CD 


Host Load Analyzer (Hardware Broker) Function 
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CD 


Quality-of-service (QoS) Manager Function 


G 


CD 


FG3: System Specification Language & System Specification 
Library (SSL) Functions 


H 


CD 


Host And Network Monitoring Functional Group 


I 


CD 


Instrumentation Functional Group 


J 


CD 


Display Functional Group 


K 


CD 


RMComms Network Communication Middleware Design 


L 


CD 


System Readiness Display 


M 


CD 


Resource Manager FG42 


N 


CD 


Instrumentation Graph Tool 


O 


CD 


Host Discovery Function 


P 


CD 


Instrumentation Application Programming Interface (API) 


Q 


CD 


Program Control Application Controller 


R 


CD 


Program Control Display 


S 


CD 


Program Control Functional Group 


T 


CD 


QoS Manager 


U 


CD 


Resource Allocation Decision-making Functional Group 



25 

Table III provides a listing of the Appendices included for all purposes in the 
application. It will be noted that the majority of the listed Appendices are provided on the 
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CD-ROM filed concurrently with the application. In addition, the CD-ROM also includes 
the source code listing for the Resource Management Architecture according to the present 
invention. 

5 Although presently preferred embodiments of the present invention have been 

described in detail herein, it shoxild be clearly understood that many variations and/or 
modifications of the basic inventive concepts herein taught, which may appear to those 
skilled in the pertinent art, will still fall within the spirit and scope of the present invention, 
as defined in the appended claims. 
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INSTRUMENTATION FOR 
RESOURCE MANAGEMENT ARCHITECTURE 
AND CORRESPONDING PROGRAMS THEREFOR 

STATEMENT OF COVERNTVIENT INTEREST 
The invemion described herein n-as made in the performance of officii duties b>' 
emplojves of the Depaitinem of the Na^y or by researchers under contrsci to an agenc>- of 
the United Stales govemmenl and, thus, the invention disclosed herein ma>- be manufactured, 
used, licensed by or for the Government for goverranental purposes uithoui the pa>Tncm of 
an>' royalties thereon. 

BACKGROUND OF THE INVENTION 

The present invention relates genentU>' to resource management s>-stems b>' w-faich 
networked computers cooperate in performing st least one task too complex for a single 
computer to perform. More spedficall>'. the presem invention relates to a resource 
management s>'stem uiiich dNnamicall)' and remote!)' controls itetv^'orked computers to 
thereby pennit them to cooperate in perforniiitg tasks thai are too complex for an>- sin^e 
computer to perform. Advantageously-, sofh\-are programs for converting a general purpose 
computer networi into a resource managed net\^'ork are also disclosed. 

The instam application claims priorit>- fiom Provisional Patent Application Serial No. 
60/207,891, wtich was filed on May 25, 2000. The Provisional Patent Application is 
incorporated herein in its entirety b>* reference. 

Resource Management consists of a set of cooperating computer programs that 
provides an abilit>- to dynamically allocate computing tasks to a collection of networked 
computing resources (computer processors interconnected on a network) based on the 
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Sev eral commercial oonqianies aredevdc^iing and tmpleinenting similar capahiHties. 
Moreover, several companies, most notabty EBM, have de%'eloped networks %-here each 
netv^Tvked desktop computer becoines a paraDd processor in a distributed cooqiuta sy^tCT 
when the desktop computer is otherwise idle. 

h uiU be appredaied that these ^ipnaches to conqiuting in a distributed envinmnie^ 
do not provide a sv-stem thai is both fle?dble and adaptive (or at least easily adapted) lo 
changes in 5>-stem configuration, performance bottlenecks, sunivabilit>- requirements, 
scalabilit>-. etc. 

What is needed is a Resource Managemem Architecture wtich permits Qexible 
control, i.e., allowing autonomous start up and shut down of application copies on host 
machines to accommodiue changes in data processing requirements. What is also needed is 
functionalit>' included in the Resource Managemem Ardnucture v,iadi permits the Resource 
Management Architecture to determine the near-optimal alignmem of host and application 
resources in the distributed environmem. It would be desirable to have a user-fiiendly 
technique with which to specify qualit>' of service (QoS) requirements for each host, eadi 
application, and the network in wliich the hosts are connected. What is also needed is 
instriunentation to ensure that the specified QoS goals are beirig met 

SUMMARY OF THE INVENTION 
Based on die above and foregoing, it can be appreciated thai there presentl>- exists a 
need in the art for a Resource Management Architecture, which overcomes the above- 
described deficiencies. The present invention was motivated by a desire to overcome the 
drawbacks and shortcomings of the presentl>- available technolog>', and thereb>' fiilfill this 
need in the art. 

According to one aspect, the present invention pro%ides a monitoring s>'stem for a 
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followii^ measures: 

an application devdoper/user description of application computer program 
performance requirements; 

• measured performance of each ^plication programs; 

measured workload (CPU processing load, memor>- accesses, disk accesses) of each 
computer in the network; and 

• measured inter-computer message communication traffic on the networL 

Many attempts to form distributed s>'Stems and environments have been made in the 
past. For example, several companies and organizations have networked multiple computers 
to form a massively paralld supercomputer of sorts. One the best knoun of these efforts is 
SETI@home, which is organized by SETI (Search for Extraterrestrial Imdligence), a 
scientific eSbrt aiming to determine if there is intelligent life out in the universe. 

T>-picall}', the search means the search of billions of radio frequencies that Oood the 
universe in the hopes of finding another civilization that might be transmitting a radio signal. 
Most of the SETI programs in existence today, including those at UC Beil:ele>', build large 
computers that anal>-ze that data from the tdescope in real time. None of these computers 
look very deepb' al the data for weak signals nor do the)- look for a large dass of signal types. 
The reason for this is because they are limited by the amoum of computer power available 
for data analysis. To extract the weakest signals, a great amoum of computer powo is 
necessary. It would take a monstrous supcrcomputa to get the job done. Moreover, SET! 
programs could neva afford to build or buy that computing power. Thus, rather than use a 
huge computer to do the job, the SET! team devdoped software to use thousands of smaD 
computers, all working simultaneous!)* on differera parts of the anab'sis, to run the search 
routine. This is accomplished with a screen sav-cr Out can retrieve a data block over the 
internet, analyze that data, and then re^xnt the results back to SETI. 
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distributed environment induding a plurality of hosts capable of executing multiple copies 
of a scalable implication, which indudes a first device for generating first data corresponding 
to performance of all copies of the scalable qjplication; a second device for generating 
second data corresponding to performance of all host in the distributed en>ironment; and a 
third device for generating performance metrics based on the first and second data. 



BRIEF DESCRIPTION OF THE DRAWINGS 

These and various other features and aspects of the present invention will be readily 
understood with reference to the following detailed description taken tn conjunction with the 
accompanying drawiiigs, in which tike or similar numbers are used throughout, and in which: 

FrgFIGS. lis A, IBcoIleciiveh- represent a hi^levd block diagram ofhardware and 
software components implemented in the Resource Management Sj-stem according to the 
present invention; 

FigFIGS. 2isA. 2B coUecti^'dv represent a functional block diagram of the Resource 
Management Architecture according to the presem invention; 

FtglG. 3 is a functional block diagram illustrating functional dements induded in the 
sj-stem specification librai>- (SSL) implementation of tlie Resource Managemem S)-stem 
according to the presem invention; 

FiglG. 4 is a block diagram illustrating one tedmique for implementing the Resource 
(Application) Control functional group FG5 in figPIGS. 2:A, 2B usirjg discrete softwwe 



FtgFICS. S-tsA. 5B represent a screen capture of a profftm control display FG54 
generated by the software components illustrated tn FtglG. 4; 

fisFIGS. e-isA. 6B represcra a screen capuire of a host displa)' generated b>- the 
Resource Managemem Architecture according to the presem invention; 

figFIGS. TisA, 7 B represent a screen capture of performance data regarding several 
of the hosts A - N induded in FignGS. ftfi.A, 6B; 



ftsflGS. 8-bA. BB reprcsot a screen csptme of a paih disfiny gcnrrared by the 
Resource Maoaganem Ardritecture according to the piesem imention; 

Ft^lGS. 9is.\, *^)B repress a screen capture of the Resource Marutgement Dedston 
Reiien- Disf^', winch provides a sunimai>' of altocadon and reallocatian acdoiis taken b>' 
the Resource Manager. 

ftgsFIGS. -t^lOA. lOB and U-aicA. IIB represem screen captures ilhistrating 
ahemaiive, user-configurable displao^ genersied &oin received data via standardized 
message formats and open interfaces: 

ftsOGS. 12-i5A. t2B represntJ a screen capture of an exen»plar>- version of the 
Readiness Displa>- FG66 according to dte present invention; 

Figs. 13A, 13B, and 13C are block diagrams uiiich are usefid in explaining various 
operational and functional aspects of the Resource Management Architecture according to 
die present invention; and 

FisIG. 14 is a high-level block diagram illustnuing connectivity and daui flow 
betvi^m the Hardware Broker and die other Resource Management and Resource 
Managemem-rdaied functional elements in the Resource Management Ardiitecture; and 

FtglG. 1 S is a his^levd block diagram of a CPU-based general computer v^hidt can 
act as a host in the Resource Management Architecture according to die present invention. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The Resource Management Architecture, which w-as and is bang developed by the 
Naval Surface Warfare Center - Dahlgren Division (NSWC-DDX provides capabilities for 
monitoririg hosts, netw'orks, and applications within a distributed computing environment 
Moreover, the Resource Management Architecture provides the capabilit)- of dv-nanuc8ll>' 
allocating, and reallocating, applications to hosts as needed in order to maintain user- 
specified system performance goals. Advantageousl>', the Resource Managemoit architecture 
provides functionality for determining both how each component within the distributed 



BNF 


Acroii>'m for ' Backus Normal Form' (-ofieir incorrectly' expanded as 
'Backus -Naur Form'), a metasj-ntactic notation -used to specif- die 
syntax of piogiamnting languages', oonnnand sets, and die hke. 
Widel>' used for language descriptions but seldom documented 
axni^terr. so thai it must usuallv- be learned b>' osroosir bota otha 
hackers. 


DAEMON 


A background process on a host or Wrf) serv er ( -normalh- in a UNDC 
environmem-X waiting to perform tasks. WcD-known examples of 
daemons are sendmail and HTTP daemon. 


FUNCTION 


A capabilitv' available on a host due to die presence of software {e.g. a 
programX a sofhvare module (e.g., an APIX etc 


GLOBUS 


Wide area network (WAN) enterprise management and control 
capability developed under DARPA sponsorship b\- USC/ISI. 


HOST 


A device including a central processor controlled b>- an operating 
system. 


ICMP 


internet Control Message Protocol - ICMP ts an extension to die 
Internet Protocol. It allows for the generation of error messages, test 
packets and informational messages related to IP. It is defined in STD 
5, RFC 792. 


JEWEL 


An open-source instruntentation package produced bv the German 
National Researdi Centw for Computw Science 


NFS 


Network File Sv-stem - A protocol developed b>' Sun Microsystems, 
and defmed in RFC 1094-, which allows a computer sv-stem to access 
files over a network as if diey were on its local disks. This protocol 
has been incorporated in products b>' more than two hundred 
companies, and is now a de facto Internet standard. 


QoS 


Quality of Service 



environment is perforating and what options are available for attempting to correct deficient 
performance, determirung the proper actions that should be taken, and enactiiig the 
determined course of actioa In addition to these capabilities, the architecture also allows for 
operator control over creating and loading pre-defined static, dvTiamic. or combined static 
and dynairuc system and/or host configurations. One particularly desirable feature of the 
Resource Management Architecture is that it provides capabilities for monitoring system 
I>erformance along with the ability* to dynamically' allocate and reallocate system resources 
as required. 

Before addressing the various features and aspects of the present invention, it would 
be useful to establish bodi terminology and the conventions that the instant application will 
follow throi^iouL In terms of terminology, a glossary section is presented below. In terms 
of conventions, diis application includes information sudi as source code listing in an 
Appendix section. Since die source code itself is hundreds of pages, the aAppendix section 
is divided into attached pages. e.g, Attached Appendix A, and an optical disk section, e.g, 
CD- Appendix N. Thus, while the appendices arc listed in onto, die reader must look to die 
signaling language to determine whether any particular appendix is actually provided in 
printed form 



API 


API (appUcatioD programming interface) A set of subroutines or 
functions that a program, or application, can call to invoke sonme 
fimctionality contained in anodier soflw^ or hardware componenL 




The Windows API consists of more dian 1,000 functi 


onsthal 




programs written in C, C-m-, Pascal, and other langua 


ges can call to 




create windows, open files, and perform other essenti 


a] tasks. An 




application dial wants to display an on-screen mess^ 


;e cancaD 




WindowVAfes3xrgeBox API fimctioo, for exam;^ 





Remos ( RCSOURCC REsource MOnitoring System)is a network 
bandwidth and topology mortitoring system developed under DARPA 
sponsorship- by CMU. Remos allows network -aware applications to 
obtain relevant information about their execution environment The 
nmjor challenges in deftrung a uniform interface are network 
heterogeneity, diversity in traffic requirements, variability of die 
information, and resource shari:^ in the network. Remos provides an 
API dial addresses diese issue by striking a compromise between 
accuracy (the information provided is best-effort, but includes 
statistical information if available) and efficiency (providing a queiy- 
based interface, so applications incur overhead only wiien they acquire 
information). Remos si^ports two classes of queries. "Flow queries" 
provide a portable way to describe a conmiunication step to die 
Remos implementation, which uses its platform-dependent knowledge 
to return to the user die capacity of die network to meet diis request. 
Topolo©' queries" reverse die process, widi die Remos 
implementation providing a portable description of die network's 
bdiavior to die applicatioa 



Simple Network Management Protocol tatcmet standard protocol 
defined in STD 15, RFC 1 157; developed to manage nodes, e g, hubs 
and switdies. on an IP network. 



An exemplary system for im;dementing the Resource Management Architecture 
the present invention is illustraie in FigFlGS. ilA, IB, which includes a 
plurality of Host computers A. B. N operatively connected to one another and Resource 
Management hardware RM via a Network 100. It will be sppredeted dial the hardvv ar? 
configuration iilustrtned in FIGS. 1 a. I B constitutes a stxalled grid s>»tem. It will also be 
appreciated that the networi 100 advantageou^- can be any known netwt)rk, e.g. a local 
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area nemttii (LAN) or a uide area nem^ork (WAN)- It uill also be appredaied that the 




provide integration details. Moreover, it be appreciated thai the funcdons and 




hardu-are RM need not be a discrete piece of equipment; the bardu-are RM adt-aittageousl>- 




functtooality' of the Resource Management Architecture according to the present inveniioo 




can be distributed across muhqile platforms, e.g . dtehostcompmatsV as discussed b detail 




are iniexoonnocted to one aiioJhCT \Ta middleu-are, wtech pro\Tdes message passing inmfeces 




belon'.~ bi ndrirrwing the functional demems and applications b the distributed 




between substantially' all of the Resource Managemem functions. This middleware package. 


5 


en^ironmem, it nill be appreciated thai hosts A-N each can instantiau applications )-M. 


5 


RMComms, is Mfy described below. 




Thus, wiien aQ applications are bang addressed, these applications nill be denoted as Al- 








NM. 




The major functional groups provided by die Resource Managentem architecture in 








an exemplary embodiment of dte presem invention are illustraled in figFIGS. 28. A 




Still referring to FrgFIGS. IB, each of d» hosts A, B, etc., preferabl)' is 




summary of the fimctions provided b>' the Resource Managentent Ardiitecture is available 


10 


controlled b>- an operating s>stem (OS A, OSB, etc), wtich permits Host A, for ejtample, to 


10 


in Aiudted Appendix A. These funcdons, taken together, prowde an tniegraied capability 




euaite appticatiom A 1 • AN, as weU as an tiistruiiientation daemon 10 A, a Program Control 




for mooitOTing and control ofa distributed computing environmenL In addition, many of die 




(PC) agent PCA, and aHost Monitor HMA It should be noted that instrumentation daemon 




functions (and functional groups) within the Resotnce Management Architecture can also be 




IDA, PC agent PCA, and Host Monitor HMA are integral to the Resource Management 




run in a non-integrated configuration, thus pro'viding subsets of the integrated Resource 




Architecture while the operating sj-siem OSA and applications AI - AN are well known to 




Managemem capabilities. 


15 


one of ordinary- skill in the art 


15 










These function(al) groups ilhistrated in FisFlGS. 22 A, 2B include: 




In ftgFlGS. tl A 1 B, the Resource Management Architecture RM advantageoush- 








includes an instrument collector 10 recei\'ing data from all of the instrumouanon daemons 




FGl - Host and Network Monitorii^ This function group consists of software utich 




ODA - IDN) and providing data to instrument corrclatorfs) 20, uiiich, in turn, prowde 




monitors the host and network resources within the distributed en\ironmeni. The 


20 


correlation data to corresponding qualii>- of service (QoS) managers 30. Resource 


20 


function group collects extensive run-time information on host and network 




Management Architecture RM also receives data from host monitors HMA- HMN ai histor>- 




configuration, statuses, and performaitce. Run-time capabilities for discovering new- 




serv ers 40, which maintain stanis and performance histories on each of the hosts A - N and 




hosts diat have been started and for determining that existing hosts have gone douu 




pro\ide selected information to host load anal>-zer 50. Anal>-zer 50 advantageously 




are also pro\ided. Distribution of currem and historical status and performance data 




determines the host and network loads for both hosts A-N and their connecting network 1 00 




to other components of the Resource Management Architecture is also provided. A 


25 


and provides that information to Resource Manager 60, which is the primary- decision makirig 


25 


more detailed discussion is provided below. 




componem of the Resource Management Architecture. It uill be appreciated that Resource 








Manager 60 also receives information from the C^S maiuigers 30 and exchanges information 




FG2 - AppUcatioD-Le\-el Instrumentation. Hie instrumentation function group provides 




Mvith program controller 70. Program controller 70 sends startup and shutdown orders to the 
-9- 




general-purpose application event reporting and event correlation capabilities. 
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Program Control Agents based on operator or Resource Manager-initiated orders. It will be 




Capabilities are provided for collecting and correlating application-provided data 




appreciated that the operator-initiated orders are received via the one of the program control 




such as application statuses, states, performance, and internally detected enon. Low- 




display's 80. 




overhead (API) libraries are provided for applications to use in sending out k^- 








intemal evem and performance data. This qiplication data is forwarded to other 


5 


As will be discussed in greater detail below, the Resource Manager 60 is the primaty 


5 


components of the instrumentation subsystem whidi collect data from applications 




dedsion-making componem of the Resource Management Architecture. The Resource 




on hosts througjtout the distributed environmenL The system also provides grammar- 




Manager 60 is responsible for determining: 




driven capabilities for correlating, combining, and reformatting application data into 




• how to respond to host and application failures; 




higher-level metrics (composite events) for use by displajs or other Resource 




• wiiere (i.e., \^hich of hosts A -N) to place new applications; 




Management components. 


10 


• which applications to Stan up in response to the detection ofa new host (host 


10 






N+1); 




FC3 - System Specifications. A specification language has been dev eloped which allows 




• how to resolve application dependencies; 




the user to specify-: 




• what applications should be started, stopped, or moved in response to 




1) application software system structure, capabilities, dependencies, and 




applicatitm sv-stem priority changes; and 




requirentents; and 


15 


• based on recommendations from the QoS Managers, when and where 


15 


2) hardware system (computer and netwxirk) structure, capabilities, and 




scalable application should be started or stopped. 




coitfiguration. 








Specification files, based on this specification language, are created by the user and 




Before leading FisFIGS. f 1 A 1 B, is should be noted dial the functions, e.g.. 




provide die ntodel of the software and hardv^-are components of the distributed 




instantiated programs or softw*are program modules, in the Resource Managemem 




computing environmem n-hich is used by other Resource Managemem functions. The 


20 


Architecture advantageously can be distiibuied across multiple platforms, e.g., multiple hosts 


20 


specification information is accessed by odter Resource Managemem functions by- 




(which ma>- or ma>' not be the iQustrated Hosts A -N) or a grid s>~stemL 




linking in a specification parser library and making library calls to read in the files 








aitd oom'cn them to an internal object model Specific specification data items can 




lite major fxmcdond groups of the Resource Managemem Ardmecture accordiog to 




then be retrieved via an object-oriented API. See the discussion below. 




the present invendon are illustrated in ftgFIGS. ^A, 28. lite functions illustrated as solid 






25 


boxes are components of the Resource Managemem Architecture and are fully- described 


25 


FG4- Reworce Allocfltioa Deasion-Makin^ This £ubs>stcni provides ihe reasoning and 




below-; dw fimctiom denoted by- diagonal striping denote third-part)' softwme wiiidi has b^ 




dedsion-making capabilities of the Resource Management ardtitecture The 




intc^mxcd ^'ttli the Resource ff*^^*^-* * ^ JKjv^x^^\ u i ^ but docs not provide core 




components of this subsystem use information from other subsystems in order to 




functionality-. Thus, the latter functions wiD be described only to the extern necessaiy to 




determine the health and state of the distributed environmem and the options that are 
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a>-ailable for aaenqnmg to recova from faults or unacceptable performance. The 
functions in thb particular functional group make decisions regardiitg: 

1 ) nitere oew appUcacons sfaould be started; 

2) u-hfiiher and niiere failed qipticaiions should be restarted; 

3) based on appHcauon inter-<lepcndeiKies, wli^ei and n-here additional 
applications should to be started prior to starting a paiticulaj application; 

4) nitether applications are meeting performance requiremects and n-hether 
and where an application can be scaled up or moved ttrniien it is oecessar\' 
to improve perfoimance; 

5) n-bether scalable applications are performing well niihin performance 
reQuirements and can be scaled doun and wiiicb cop>' should be brought 
down; and 

6) based on operator dianges to application 5>'stem priorities, witether and 
where new applications need to be started or whether and uhidi existing 
applications need to be shut dowa 

FG5 - Application (Resource) ControL This subs>stem pro\ides application control (i.e.. 
Program Control) capabilities i^hich permit starting, stoppii^ and configuring 
applications on each of the hosts in the distributed en\-iraiunent. The subs>-stem 
pro%ides both interactive operator control of the distributed en\'iromnent as u-ell as 
automatic control \ia configuration orders received from the Resource Allocation 
Decision-Making Subs>-stem (i.e., the Resource Manager component). The 
interactive controls allow an operator to create, load, save, and edit pre-defined 
s>stem configurations (e.g., lists of applications that are to be run, with or without 
specific host mappings), determine the status and configuration of currently' nmrung 
programs, and start and stop anv- or all applications. Both static (operator-entered) 
mappings of applications to hosts and dynamic mappiitgs of applications to hosts 
(where the Resource Allocation Decision-Making Subsy stem will be queried to 
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discussion bdow also imhwifi an overview of the infonnatioo flow between functioo Mocks 
within the saine fuoctional groiip and betw'ecn fimctioablocls in sqjaraie functional groups. 

FGl - Host and Network Monitoring Functional Group 

Functional group FGl prondes extensive monitoring capabilities at the host and 
network In ds. The information monitored inchides statuses, configuratkm information, 
performance metrics, and detected tauh conditions. B>- monitoring the individual hosts and 
network con^Knents within the distributed environmem, the functional group FGl 
determines: 

Accurate State and Performance Information, primarily by gathering the lev d 
of information necessary' for accurately' determining the state and health of 
each madiine and network component 

Distribution of Cuireni Data to Resource Management ConqxMients by* 
providing currem performance and status information, either periodically or 
on request. 

• Distribution of Historical Data to Resource Managemtnt Conqxments, thus 
providing historical pterformance and status information on request 

ft will be appredated that the functiottal group FGl makes these determirmtions by 
(or while) providing: 

• Common Monitored Data Set and Formats, which pemuts functional groiq> 
FGl to gather the same set of statuses and stidstics in the same formats for 
eadi host regardless of machine architecture or operating system. 

• Minimally-Intrusive Data Collection Medianisnts, which permits functional 
group FGl to gather the information in as non-intrusive a manner as possible 
(in terms of CPU utilization, network bandwidth utilization, etc...). 

Near Real-Time Data Collection Mechanisms, which permits functional 
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detemune the proper mapping at run-time) can be defined. The subsystem also 
provides application fault detection capabilities which are triggered by the 
unexpected death of an application that was started by the subsystent A basic host 
fault detection capability is also provided which is triggered based on failure to 
recdve heartbeat mess^es from subsystem components running on a particular host. 

FG6 - Displays. The display subsystem provides capabilities for visualizirtg the status, 
performance, and healtii of the hosts, networks, and applications in the distributed 
environment Capabilities are also provided for visualizirtg the status, performance, 
and health of the Resource Management conqwturtts themsdves. 

As mentioned above, the RMConuns middleware package provides the internal 
message passing interfaces between substantially all of the Resource Management functions 
both within each fimctional group and between the various ftmctional groups. The 
middleware provides for automatic location-transparent many-to-many client-server 
connections. Low-overtiead, rdiable message passing capabilities are provided. Registration 
of message handler callback functions for spedfied requested message types is provided with 
the message handler functions being invoked when messages arrive. Registration of 
conxtecticoi status callback functions, which are invoked when dther new connections are 
made or existing connections are broken, is also provided. The middleware package also 
allows for multiple diem and serv er ot^ects to be instantiated in the same application, is 
thread-safe, sad provides an easy-to-use object-oriented API ihrot^ which all capabilities 



A detailed oveniew of each ftmctional group and each function instantiated within 
each of the ftnctioa groups FGl - FG6 of the exemplary embodiment of the Resource 
Managemem Architecture illustrated in FtgFIGS. 22A. 2B, induding the capabilities 
provided hy the functional group or function, wiD now be described in greater detail. The 
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group FGl to gather the information in as timely a manner as possible. 

The Host and Network functional group FGl indudcs the four functions set forth below; 

1 ) Host Monitors FGl OA - FGl ON, which reside on each respective machine 
in the distributed environment and collect extensive operating system-levd 
data for each host A - N. 

2) History Servers FG12A - FGI2N, which collect data from the Host 
Monitors FGl OA - FGl ON, respectively, maintain status and performance 
histories on each host A - N in the distributed environment, i.e.. in the 
Resource Management Architecture, aitd provide this information to displays 
and other functions with the Resource Management Architecture. 

3) Host Discovery Function FG 1 4, whidi uses Simple Network Managemem 
Protocol (SNMP) calls and ping Internet Control Message Protocol (ICMP) 
calls to determine when new* hosts, Eeg., host N+1, come on-line and if an 
existing host, e.g., host K, goes dowu 

4) Remos Network Data Broker Function FG 1 6. which coUects information 
on network link bandwidths from the SNMP-based Remos tool (devdoped 
by Caniegie Mellon University) and passes this information to the Host Load 
Analyzer function of tiie Resource Allocation Decision-Making functional 
group FG4, both of which are discussed in greater detail bdow. 

Host monitoTS FGl OA- FGl ON, whicb monitor the sta&is and performance of hosts 
A -N. respectively', are instantiated on each host machine within the distributed environment 
Host Monitors FGIOA - FGION employ operating system-levd nwchanisms to retrieve 
status, configuration, and performance information on each host A • N. The information 
retrieved inchides: 

1 ) operating system version and ttutchine configuration; 

2) CPU configuration, status, and utilization; 
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3) menioi>' configuration and us:^ 

4) neiwoil configunoion, status, and iitiHration: 

5) filesx^tem configuration, status, and utiHzaiion; and 

6) process status including CPU, nifimof>'> net%t)rk, and filesj'Stena uuiization 
for eadi process. 

While Host Monitois FGIOA - FCION are primarilj' responsible for tnoniioring the status 
of a particular host, the>' also pronde information on netut>Ti load as seen by thai particular 
host In the same manner, tfte Host Monitois FGIOA • FGION also provide information and 
statistics concerning an>- remotel>- mounted Glesj-siems. tg., Nrti*x)rt File S>-siem (NFS). 

The information that the Host Monitors FG 1 0 A • FG 1 ON collect 8d\-antageousl>- can 
be foimatted into operating s)'5tem-indq)endent message formats. These message formats 
pro\ide a pseudo-standardized set of state, status, and performance information utich is 
useful to other components of the Resource Management Architecture, i.e.. other components 
do not have to be au-are of or deal uith the minor dififercnces betu-een data formats and 
semantics. It uiU be appredaied thai since not aU the state and performance data is available 
on e\e:iy platform, in order to indicate which information is available, a group of flags are 
set in the host configuration message indicating uhether specific data items are valid on a 
particular platform. 

Histor>- Serv ers FG 1 2A - FG 1 2N are responsible for collecting information from the 
Host Monitors FGIOA - FGION and maintaining histories on the statuses, statistics, and 
performance of each host A - N in the distributed en%TronmenL This information 
advaniageousl>' can be requested b>' other fimctions instantiated in the Resource Management 
Architecture. Preferabt)*, the primaj>' consumers of the status information obtained by the 
History Serv ers FGl 2A - FGl 2N are the Host Load Ansiyzex (Hardware Broker) component 
of the Resource Allocation Decision- Making fimctional group FG4, the Host Display' FG62A 
- FG62N and the Path Display FG64 of the Displa>^ fimctional group FG6. The Host Uad 



Anal>-zer FG40 receives information on host configuration and loads (priinaril>' CPU, 
memory, and network data) fiijm History Servers FGl 2 A - FG12N and emploj-s this 
information to assign host fitness scores. Each Host Display, e.g., FG62A, receives and 
display's current status information on one of the hosts A • N. indudii^ process status 
information, and netn-ori connecti\itv- informatioa Eadi Host Displa>' can also request that 
a respective one of the History Ser\'ers FG12A - FG12N provide CPU load information, 
netuork load information, paging acti\ity data, and memory utilization information, which 
is used to drive line graph charts for specific selected hosts. 

It will be appredaled that History Servers FGI2A - FG12N are designed so thai 
multiple copies can be run simultaneously. Each of the History Serv ers FG12A - FG12N 
advantageously can be configured to either monitor all Host Monitors or to monitor onlj' a 
selected set of Host Monitors. It should be mentioned ai this point that the History Ser> ers 
FG12A - FG12N determine the list of hosts in the distributed en^ironmem tfiat could 
potentially be monitored from the Sj-stem Specification Library. In this manner, the History 
Servers advantageously can be used to provide survivability (by ha\-ing multiple History 
Serv ers connected to each Host Monitor) and/or to perform load-sharing (with the History 
Sen CIS FGl 2A • FGl 2N each monitorii^ ody a subset of the Host Monitors). It will also 
be appreciated that the History Sen eis FGl 2A - FGl 2N can be configured to periodically 
record history data to disk. These disk files can then be used for off-line analysis of the 
Resource Managemeni Architecture. 

The Host Discovery function FGl 4 emploj-s Perl scripts in making SNMP and ICMP 
ping calls. These calls are used to periodically scan each subnet ami host address in the 
distributed environment in an atterrqn to determine wliether there hav e been any host status 
dianges. In an exemplary esse, the list of hosts and subnets that are to be monitored is read 
in fitmi a file; alteroativety, this information can reside in and be read fiom the S)'Stem 
Specification Library, whidi b discussed in greater detail below. 



It should be meniianed thai w-hen a new- host is first detected, the new- bost^ operating 
system configuration b queried by ihc Host Discovery function FG14 tia SNMP cafls. 
faformatian on the newly- discovered host and its operating s>stem configuration b then seni 
to die Program Contrd function FG50 in appticaiion control fimctiooal grot^ FG5. 
Likewise, wtej a host fails to respond to muhiple SNMP and ping queries, a message 
imfiomnfl thai the host appears to have gone down b sent to the Program Control fimctioo 
FG50. 

The final componem of the Host and Netwxiric Monitoring fimctional group FGl b 
the Remos Netwuil Data Broker FGl 6, which receixes information on nctwwk link 
bandwidth and neiw-oik link bandwidth uiihzanon from the SNMP-based Remos network 
monitorii^ tool mentioned above. The nerw-ork information b accessed via the Remos 
application programming interface (API) library and b then sent on to the Host Load 
Anab-zer (Hardw^ Broker) fimction FG40 of the Resource Allocation Decbion-Making 
fimctional group FG4. The network information received from Remos consbts of the 
maximum potential bandwidth and the current bandwidth utiHzation on specific host nrtwtMk 
links. As mentioned above, Remos network mooiioririg tool FG 1 6 b not a core component 
of the Resource Managemem Architerture; that bong the case, no further details on either 
Remos or the Remos Network Data Broka are provided in ihe instant application. 

FG2 - AppUcatioii-Le\-el Instrumentation Functional Group 

The Instrumentation fimctional group FG2 advantageous!)- provides general-purpose 
application event reporting and c\ ent correlation capabilities. The Instrumentation fimctional 
group permits instrumented application data to be easily- accessible to other components of 
the Resource Management Architecture. The fimctional group provides capabilities for 
collecting and correlating application-provided data such as application statuses, states, 
performance, and intemaU>- detected errors. Low-overhead API's are provided thai the 



applications can use for sending internal event and performance data to the instrumouation 
components. The instrumentation functional group FG2 can collect data from applications 
on hosts A - N throughout the distributed enviromneni. The fimctional group also provides 
grammar-driven capabilities for correlating, combining, and reformatting application data 
into hi^-Ievel metrics (composite events) for use by dbpl^'s or other functional groups 
of the Resource Maruigement Architecture. 



The Instrtimentation fimctional group pro\ides: 

open API's and non-proprietary architecture 
near real-time monitoring support 
cross-language support: C, C++, Ada 
cross-platform support: Solaris, IRJX, Linux, etc... 
simple cas>'-to-use API's 
low-intrusive instrumentation interface 

instrumentation interface that docs not significantly change the run-time 
behavior of the applications 
support for passing wide range of data types 

support for data marshallirig / unmarshalling (s>stem independent data 
formats) 

stqtpon for adding to or dunging the information bang instrumented without 
having to recompile portions of the architecture unaffected by the chartges 
(preferably, no recompilaiion should be necessary expect for recompilaiion 
of the ^p being instrumented and any ev ahiation lo0c or display? that have 
been affected by- the changes) 

scalable architecture ( 1 0(H hosts / 2(H- qjps per host / 5+ threads per app) 
abilit)- for the architecture to perform auto-configuration as required 
ability to run multiple tests, multiple dbi^3 and multiple data I 
components simultaneously 
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&btlit>' to ab strnft nvk*vy the ui>dert>iQg connecdWtjycornntumcaiions ben^wo 

abiUi>' for msmnnemation infrasmicmre to be bnnig^ 
^jplicaiioii is running 

8bilit>' to eastl>' build snd configure acv (lispla>'S and daia logging 

compocans (tntentctive configuration is preferable) 

abilit>* to easify build and configure nen' perfomumce and data correlstion 

components (interactive configuiatioo is preferable) 

backw-snls compatibilit)' nitfa existing Jen-el Instrumentation dispta;>^ 

(protect investments in existing display' capabilities) 

backv^'ards con^>atibilit>' vdih existing Jeuel Instrumentation function calls 

(provide ease of transidon / backfit) 



As illustrated in FigFlGS. 22 A, 2B, the Instrumentation fimctiona] group FG2 
includes dte components enumerated below. In addition. Instrumentation APIs and Jen>'el 
Instrumentation uill be addressed along uith the Instrumentation funcdonal group. i.e., the 
Instrumentation functional group includes: 



I) 



InstTumcntation API Libraries FG20 are linked with the applications and 
pro\ide the function call interfaces b>' utich diese applications send 
instrumentation data. 

Instnunentation Daemons FC22A - FG22N reside on each host in the 
distributed environment and are responsible for reading instrumentation data 
sent out b>' the applications, reformatting the data into instrumentation evem 
messages and sending the messages to the Instnmientation Collectors. 
Instnnnentation CoUecton FG24A •FG24N connect to the Instrumentation 
Daemons FG22A - FG22N on each host and receive instnmientation 
messages from host A - N. The Collectors forward received messages to the 
Instrumentation Correlators FG26A - FG26N and histrumentaiion Brokers 
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FG28A - FG28N. 

4) Instrumentation Correlators FG26A - FG2eN receive instrumentation 
messages from the Instrumentation Collectors FG24A - FG24N and proWde 
grammar-driven capabilides for correlating, combining, and reformatting 
^plicadon data into higher-level metrics (composite events) for use by 
displ^-s or other functions of the Resource Management Architecture. 

5) Instnunentatioo Brokers FG28A - FG28N receive instrumentation 
messages fi^m the Instnmientaiion Collectors and perform task-specific 
reformatting and data maniptilation for driving display's or other Resource 
Management components. 

6) Jewd Instrumentation Broker ((^ Monitor) FG29 (a legacy component) 
receives instrumentation data fiom either the open source Jeu-d 
instrumentation package or from the Instnmientation Collectors. The QoS 
Monitor FG29 performs task-specific message refomiattir^ and data 
manipulation for driving displa>'s and the ()oS Managers FG44A - FG44N. 

The applications, e.g, A I -AN, link in the Instnmientation API Library FG20 and 
make API calls to construct and send out instrvmenuttion event messages. Three separate 
APIs are provided for use b>' the applications: 

1) a printfO-st>ie API n-hich aSlows the code to formal, build, and send 
instrumentation data with a sitigte function call; 

2) a bu£rer-constructioD-5t>1e API vAtm the multiple function calls are made to 
constnict the instrurnentation buffa iierativd>', one data elemem per call; arid 

3) a Jewd fintction call API based on the existing API pronded b^' the Jewd 
instrtimentatioo package (an open-source package produced b>' the German 
National Research Center for Computer Science). 

The first two APIs are the preferred programming interfaces and take advenia^ of several 
ke>' instrumentation features while the Jewd API a provided solel>' for back'wards 



compatibilit>' nitb existii^ instrumented application code and is inqilemented as a sa of 
^Tappers around the primfD-s^ie API. AD diree APIs are supported forC and C++. ADA 
bitKlings have also been produced for the bufifer-constructiott^le API and the Jewel 
function caD API. 

Preferabb'. the instrumented data is sem from the application to one of the 
tastrumeniation Daemons FCi22A -FG22N on a respective one of the hosts A - N »-here the 
application is running. The cuirentlj* preferred mechanism for daa transfer is via UNDC FIFO 
(first in - first out) IPC Onier-process communication) mechanisms. It vnQ be appreciated 
that the FIFO mechanism was chosen based on reliabiKt)', lov*' overhead, and ease of 
implementation. Ahemative data passing mechanisms including shared message queues are 
considered to be within the scope of the present inventioa 

As mentioned above, an Instrumentation Daemon resides on each host in die 
distributed environment The Instrumentation Daemon is interrupted whenever new data is 
writtm to the FIFO. The Instrumentation Daemon reads the data from the FIFO, reformats 
the data imo die standard internal Instnmwntation message fonnat (discussed bdow), and 
sends the data to each of the respective Instrumentation Collectors FG24A -FG24N that are 
currentlv- active. Altemativdj-, an event request filtering mechanism can be implemented so 
that specific event messages uill onl>- be sem to those ones of the Instnmientation Collectors 
FG24A -FG24N dial have requested the message. 

The standard instrumentation message formal indudes a header, a formal strii^ 
describing die application-provided data contained in dw message, and die actual data values. 
The message components are illustrated in Attached Appendix B, 

The Instrumentation Collectors FG24A- FG24N recdve instrumentation messages 
fiom die Instrumentation Daemons FG22A - FG22N on each host A - N, respectivel>*, in the 
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distributed environment CurrenUy, the Instrumentation Collectors FG24A- FG24N send 
every instrumentation message to all Instrumentation Brokers FG29A- FG29N and 
Instrumentation Correlalors (Brokers) FG26A- FG26N dial have connected to die 
Instrumentation Collectors FG24A- FG24N. The Instrumentation Collectors FG24A- FG24N 
serve as a pass-through server for instrumentation messages. The Instrumentation Collectors 
do support architecture scalabilit>' in the sense that without the Instrumentation Collectors, 
the Instnmientation Broker FG29 and Instrumentation Correlators FG26A- FG26N would 
need to maintain connections to the Instnmieniation Daemons FG22A- FG22N on evei>* 
host As discussed above, an event request filtering mechanism advantageousI>* can be 
implemented so diat specific event messages will onl>* be sent to diose Instnimentation 
Brokers / Instrumentation Correlators that have requested the message. 

Prefffably, dw Instrumentation Correlators FG26A- FG26N provide granimar-<biven 
capabilities for correlating, combining, and reformatting application data into hi^ier-levd 
metrics (composite events) for use b>' displaj-s or other components of the Resource 
Management Architecture. Each Corrdator reads in a user-specified correlation grammar file 
which is interpreted ai run-time b>' die Correlator's instrumentation correlati(m engine. 

The Instrumentation Brokers FG28A- FG28N are task-specific applications buih 
around a common code padmge. The Instrumentation Broken FG28A- FG28N receive 
instrumentation messages from die bistrumeniation Collectors FG24A- FG24N, filter all 
received instrtmientation messages to find die messages of interest and perform task-specific 
message data reformaiting and manipulation for driving odwr components such as displa>'S 
or other components of the Resource Management Ardntecture. This Instrumentation Broker 
approach permits instrumenaatiao daui sources to be quiciiv' integrated for test, dispUry', and 
g purposes. 



It siiOfM be mentioned at this poim thai the Jewel Instrumentation Braker FG29 
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(hereafter referred to the QoS Momtor) is a tegac>- architecture con^onent thai sen ed as a 
broker betttwi die JcktI mstiumeDtaiion package oomponems and Resou^ 
components and displa}^. The QoS Monitor FG29 respoosiUe for poUing dw icwd 
CoOector components to retrieve applicarion e%eni m essa g es. These, messages u-ere then 
nformaittd and used to drive snera] display's and the QoS Managers FG44A • FG44N. The 
Jen-el instrumentation package has now been replaced in aU ^)ptications, however the 
message refonnaning capabilities of the QoS Monitor have been maintained so thai sev eral 
displa>-s and die cdsting QoS Manager interface do not have to be tqigiaded iimnediatd>'. 
The QoS Monitor component has been modified so that it receives instrumentation data from 
both Jm-el and the Instrumentation CoUectors. 

FGJ - SYSTEM SPECIHCATIONS FUNCTIONAL GROUP 

Still refening to FtgFIGS. 22A. 2B, it should be noted that a S)-stem Spedficatioa 
Language has been de%'doped n-hicfa allou? the user to specif>- both <1) soft\^'are syttaa 
structure, capabilities, dependencies, and requirements, and (2) tumlu-are s>'stem (computer 
and network) structure, capabilities, and configuraiioa Sj-stcm Specification Files, generally 
denoted FG32, uhich are based on this specification language, are created by the user and 
provide a model of the software and hard^vare components of the distributed computing 
en^TTonmcnt wiiich is used by the Resource Management Architecture. The language 
granunar advantageous^' can capture the following information related to the distributed 
en\'irorunent and the applications that can run within the distributed en\ironment: 
Hardware and Operating Systems 
Hardware Configuration 
Network Configuration 
Operating S>'5tems and Version 
Software 

S>'stems, Subs>'stems, Paths, Applications, Processes 



scalabilit>- rftpf^^^'Tiit of the application can be specified. This latter tnfonnation includes 
niietfaer an q){dication can be restarted if it fa^H , whether multiple copies of an application 
can be run. wta: i>-pe of scalahili^- the applicatioo supports (eg., Primary-Shadow, Load- 
Sharing etc..), and the minimum and maximum number of copies that can be nm. 
Moreover, an estimate of the amount of CPU, men»i>-, and network resources thai the 
application will use at lun-time, ad^'antageousl)' can be specified. 

At the host lex el, the operating s>-stem and version, the hardware architecture, the 
host^ network interface name, and die SPEC organization's SPECfipi95 and SPECint95 
ratings for dw host can be specified. At die network le\ cl. router and switt* configurations 
and banduiddis can also be specified. 

Moreover, application data flow paths can be defined induding a gn^ih of the data 
flow betw een applications along with performaitce requirements tied to one of more of the 
applications uidiin die padL It should be mentioned dial diese defined requirements are 
named aitd are tied at run-time to Instjumentaiion Event data prouded by the Instrumentation 
Correlators FG26A- FG26N. Monitoring of dw performance requirement is tfw 
responsibilit)- of dw QoS Manager components FG44A - FG44N. as discussed in greater 
detail below. 

As noted above, dw S>-stcm Specification Language pro\ides a hierarchical strticture 
for defining software and hardware s>-steins. The cunem structure is shown below: 
Software Specifications 
Application 

Security- 
Configuration 

Hardware Requirements 
Startup Info 
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Resource Requirements 
QoS Requirements (Events) 
Survivability' Requirements 

Data Flow Padi Infonnation: Structure and QoS Requirements 

It niU be appreciated diat dw S>-5tem Specification Language allows for grouping 
hardware and software components into s>'Stems and subsj-stems in order to create a 
hierarchy- of components. Each application sy-stem and subsystem can be assigned a priority 
which is used at run-time to determine die relative importance of applications running in the 
distributed envirorunem 

At the application level, the hardware, operating system, and other host requirements 
for each application can be specified along with information describing how to start up, 
configure, and shutdou-n die applicatioa This infonnation can include: 

a) environnunt variables that need to be set; 

b) the working director}- for ruimir^g the application; 

c) the path(s) and file nanu of dw qiplication; 

d) command-lirw arguments diat should be set, including arguments diat need 
to be resolved at run-time (e.g., die hostname where anodier application is 
running, dw current date, dw current userid, a unique run-time identifier 
number, etc...); 

e) whedwr dw application needs to run in an xterm; 

0 whedwr a script file or signal should be run to shutdown dw application; and 

g) which script or signal should be used. 
In addition, startup and shutdown depeiulencies between applications can be specified. 
Moreover, application states can be defined based on received instnmaemation data vahws, 
the length of tinw an application has been runnii^ and/or the set of processes that are 
currently running. Furthermore, for each application Al - MM, dw survivability and 
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Dynamic Arguments 
Shutdown Info 
States 
Dependencies 
Initial Load Estimate 
QoS Info 

Sur\ivability- 
Scalability 
Hardware Specifications 
Host Info 
Network Info 
LANs 

Network Devices (Interconnects) 
Path Specifications 

Data Flow Graph 
Data Flow- Info 
QoS Requirements 

The specification information is accessed b>- linking in a spedficatioo parser libraiy- 
FG34 and making library- calls to read in dw files and convert diem to an internal object 
model, and by- making object access nwdiod calls to retrieve specific data items. The 
spedficaiion library is written in C++ and has been ported to all of dw development 
platforms in the tesdwd. The library- is currendy being used by- most of dw Resource 
Management components, including Program Control FG50, dw Resource Manager FG42, 
dw QoS Managers FG44A -FG44N. dw Hardware Broker FG40, and dw History Senen 
FGI2A-FG12N. 

It should be mentioned thai Ute software used to construct the API library consists of 
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(1) a psrsa file ftiai defines the g n mii n ar fm BNF fonnaiX a lexical m^ia file thai 
defines the tokens of dw lai^guase, and (3) a sei of C++ S)'Stetn Spedficaiion classes for 
stonng die ^>edficatioa file informaiioiL Hie lexical anat)-zer file is compiled nith theGNU 
Qex. (lex) utilit>- and fbe parser file is compiled using die GNU bison (>*acc) uiilit\'. The Res. 
and bison utilities create C source files ntnch are then compiled along »idi die C++ S>^tem 
Spedficaiion object storage dasses to create die Sjstem Spedficadon Ubrai>' (SSL) FG34. 
This tibrai>' is then linled uith die Resource Managemem applications. An ovenie«- of diis 
structure is pronded in Ft^G. 3; a more detailed discussion of die various fiaictions are 
provided belo«'. 

FC4 - RESOURCE ALLOCATION DECISION-MAKING FUNCTIONAL GROUP 

As illustrated in fisFIGS. ?2A, :B. die Resource Allocation Dedsioo-Making 
funcdonal group proudes die reasoning and dedsion-making capabilities of the Resource 
Management architecture. The fimcdons associated widi this functional group emploj- 
information (listed bdow) to (1) determine die state and healdi of die distributed 
enviromneni (hosts, netv^-orks, and applications), and (2) determine w-hat allocation and 
reallocation actions need to be taken. The information provided to functional group FG4 
tndudes: 

S>-stem Specifications: 

Host configuration and capabilities 
Application capabilities 

Survi%ability 

Scalabilit>- 

Potential hosts to run on 
Application startup and shutdown dependendes 
Application and path performance requirements 
Program Control: 
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Started (or shut down) prior to stanii^ (or shutting down) a par ticular 



Based on startup and shuldoTAn dependent' resolution requests from Program 
Control, determine uiioher and n-bcre additional applications should to be 
started (or shut donn) prior to starting (or dtutiing down) a particular 
application 

Based on application instrumentation data and performance requirements 
defmed in die S}-stem Specification Files, determine u-bedter applications are 
meetii^ performance requireinenis and uhedier an application can be scaled 
up or moved to attempt to improve performance 

Based on application in str t im e nta tion data and performance requirements 
defined in die S>stem Specification Files, deteraune whedier applications are 
performing wdl nithin performance requirements and can be scaled dov^n 
Based on operator changes to application s>'stem priorities, determine 
nitedia and uhere new applications need to be started and/or determine 
tt-hedier and u-faich existing applications need to be shutdown 
Based on indication dm a new host is on-line (from Host Discovei>- \ia 
Program Control), issue startup orders to bring up a Program Control Agent, 
Host Monitor, and htstnimentation Daemon on dw new host which will bring 
die host under Resource Management control 

The Resource Allocation Decision-Making functional group iir^lemenis one of die 
duee discrete functions listed bdow: 

1) Resourte Manager FG 42 is die primary decision-making component of die 
Resource Management Architecture. Resource Manager FG42 is responsible 
for determining ( 1 ) how to respond to host and ^plication failures, (2) where 
to place neu' applications, (3) uhich appUcations to start up tn response to the 
detection of a new host, (4) how to resolve application dependendes, (5) 
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Application statuses 
Detected application faults 
Detected host failures 
Detection of new host 
Operator initiated requests 

Resolution of application startup or shutdown dependendes 
Sdection of application-to-host mappings 
Histor>' Servers: 

Host statuses, configuration, and loads 
Network link statuses and loads 
Remos Network Data Broker 

Network link statuses and loads 
Instrumentation Subsj-stem: 

Application performance information 
Readiness Display: 

Run-time changes to application s>-stem priorities 



The subs>'stem components make decisions based on die following triggers and data 
Es; 

Based on requests from Program Control, determine where new applications 
should be started 

Based on indication of application failure from Program Control, determine 
whether and where the failed applications should be restarted 
Based on indication of host faihire from Program Control (or indirect!)' fiom 
Host Discovery ), determine wtetha and where die failed applications should 
be restarted 

Based on appticzoion imer-dependendes defined in the S>'stem Specification 
Files, determine witether and where additional applications should to be 



what appbcations should be started, stopped, or moved in response to 
application sjstem priority- changes, and (6) based on recommendations from 
die (JoS Managers FG44A - FG44N, when and where scalable qiplication 
should be started or stopped. 

2) Host Load Analyzer FG40 is responsible for assigning a set of fitness scores 
to each host based on host capabilities and loads. 

3 ) QoS Manaeerv FG44 A -FG44N are responsible for monitoring application 
and padi requirements as defined m die S>-stem Spedfication Files FG32 and 
recommending dial applications be ddier scaled up, scaled down, or moved 
in order to maintain acceptable performance. 

As mentioned above, die Resource Manager ¥G42 is die primary- dedsion-making 
component of die Resource Managemem Arduterture. It is responsible for 

(1) responding to application and host failures b^- determining if and what 
recovery actions should be taken; 

(2) determining if and where to place new copies of scalable applications or 
which scalable applications should be shtitdown when the (}oS Managers 
indicate that scale-up or scale-down actions ^uld be taken based on 
measured application performance; 

(3) determining where new applications should be placed when requested to do 
so by Program Control; and 

(4) detennining witidi and how many- applications should run based on 
application system (mission) priorities. 

In order to accon^lish these tasks, the Resource Manager FG42 maintains a g^bal 
view of the stale of the entire distribuud environment inchiding status information on aD 
hosts A- N. networks 100, and applications Al -NM. In addition, die Resource Manager 
FG42 also calodares software and h ardware readiness metrics and reports these read in ess 
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%-alucs, for dis;da>- purposes, to the di$pla>- functional group FG6. 




The Resource Manager FG42 currend)* responds to appticatioo system priorii>' 
changes received from die Readiness Broker (translation software in or associated with the 




h niD be sppredated &Dm fisFICS. i2X 2B thai the Resource Manager FG42 




Readiness Display* FG66) in the foDowipg manner 




receives stanis and failure infonnatioo about hosts, netnwks, and applications froni Program 




(1) If die priorii)- is changed to None, all applications associated with dw 


5 


Controtl function FG50. Hus infonnaiion inchides both penodic status updates and 
immediate updates u-hcD statuses change such as a nei^' host being detected or an appUcadon 
foiling, hi the case of apphcadon shutdo\^n, tnformatioa as to whether the application «-as 
sfautdouTi intentional))- or n-heiha the appticstioo failed is also pro%ided. Program Control 
function FG50 also issues requests to the Resource Manager FG42 w-hen new applications 


5 


specified s>-stem are sfauulowiL 
(2) If die prioritv- b changed to Low, all scalable applications widiin die specified 
s>'Stem are scaled back to no more dian 50% of potential maximum 
scalabilit>- and are not allowed to be scaled up past die 50% limit im^ardless 
of performance. 
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need to be d>'namicall>' allocated and i\-hen the Program Control function FG50 determines 
that die Resource Manager FG42 needs to assess and attempt to resoh e inter-applicatian 
dependencies (such as one application wiiich needs to be nmniiig prior to starting up another 
application). 
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(3) If die priorit>- is changed to Medium, normal scalcup and scaledowii 
functionalii>- is allowed. 

(4) If die priorit>- is changed to High, all scalable applications are scaled up to at 
least 50% of potential maximum scalabilit>- and are not allow^ to be scaled 
down to less dian 50% irregardless of performance. 


15 


The Resource Manager FG42 responds lo faulted applications and hosts b>' 
determining wiwther the failed applications can and should be restarted and attempting to 
detennine n-here (and if) there are hosts available that the application can run on. When a 
decision is made b>' the Resource Manager FG42, a message is sent to Program Control 
FG50 specif)'Tng nhat application to start and uiiere to put it, i.e., u-faidi of hosts A - N to 


15 


(5) If die priority is changed to Urgent, all scalable applications are scaled up to 
100% (for maximum sunivabili^') and are not allowed to be scaled down. 
[Moreover, if die prev ious priority was None, and die new changed priorit>- is hi^er dian 
Nraie, all required applications within the specified s^'stem are started up subject to the 
limitations outlined for each of die priorii>' lev els listed above.] 


20 


start die application on. The same general mechanism is used when Program Control FG50 
requests that the Resource Manager FG42 detennine nhere to start new applications and/or 
how to resolve inter-application dependencies; die Resource ManagcrFe*2Manager FG42 
responds uith orders indicating what applications to start and where to start them. The 
Resource Manager FG42 advantageously can send applicadon shutdown instructions to 


20 


The Resource Manager FG42 also sends information about allocation and 
reallocation decisions to the Resource Management Decision Re\iew Display's FG68A- 
FG68N. as discussed in greater detail below. Information on die decision dial was made, 
what e\eni die decision was in response to, and how long it took to botfi make die decision 


25 


Program Control FC50 requesting dial a certain iq)plication be stopped; this can occur when 
die QoS Managers FG44 A-FG44N indicate that certain scalable applications have too man>' 
copies running or when application s>-stem priorit>' changes (when an application changes 
from a high priority to a lower priorit>*) occur resulting in scaling back the application s>'stem 
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and implement the decision advantageousl)* are also sent to die displa>- functional group F(j6. 
In addition, information about the alternative choices for wtcre an ^plication could have 
potentially been placed is also provided (if applicable); in an exemplar>' case, diis 
infonnation includes die host fimess scores for die selected host and die next best host 
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configuration. 




choices wtdch could have been selected. 




The Resource Manager FG42 also receives host load and host fitness infonnation on 




As described above, die Resource Manager FG42 communicates widi Program 




all known hosts from the Hardware Broker (Host Load Anal>-zer) FG40. This information 




Control FG50, die Hardware Broker FG40. die QoS Managers FG44A -FG44N, QoS 


5 


includes (1) overall host fitness scores, (2) CPU-based fimess scores, (3) network-based 
fitness scores, and (4) memoo^ and pag?ng-based fitness scores, along with (5) the SPEC95™ 
rating of the hosts. These scores are used by the Resource Manager FG42 for determining 
the "best" hosts for placing new applications when: 

( 1 ) responding to requests fitim the QoS Managers to scale additional copies 


5 


Specification Control (not shown), die Readiness Broker of displav- FG66, die Globus Broker 
(e.g, message translation software (not shown)), and die RM Decision Review Displ^-s 
FG68 A- FG68N using die RMComms middleware, which wiU be discussed in greater detail 
below. 


10 


of an application; 

(2) attempting to restart failed applications; 

(3) responding to requests to d>-naniicaU>' allocate certain applications; and 

(4) responding to application s>'Stem (mission) priorit>' dianges which require 
scaling up additional applications. 
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The Haniw^ Broker (Host Load Analyzer) FG40 is die host load analysis 
component of die Resource Managemem Architecture, which is primaril>- responsible for 
determining die host and network loads on each host A - N widun die distributed computing 
en\Tronment The Hardware Broker FG40 assigns a set of fitness scores for ead) host and 
pcriodicalb* provides die list of fitness scores to die Resource Manager FG42. 


15 


The Resource Manager rG42aisoFG42 also receives requests from die (>oS Managers 
FG44 A-FG44N for scaling up, moving, or scaling down specific applications. The Resource 
Manager FG42 responds to dicse requests by determining whether die request should be 
acted upon and, if so, determines die specific action to take. The Resource Manager FG42 
then issues orders to Program Control FG50 to start up or shutdown specific applications on 


15 


The Hardware Broker FG40 advantageousl>- receives operating S)-stem-le\ eI statuses 
and statistics for each host A- N from die History Serv er<s) FG12A-FG12N. respectivelj-. 
This infonnation can be employed for calculating CPU, networi, memory, paging acti\ii>-. 
and overall fitness scores for each of die hosts A-N. Preferably, die Hardware 


20 


specific hosts. 

It should be noted diat when the Resource Manager FG42 is first started, it reads in 
the S>3tem Specification Files FG32 (^ia calls to S>^tem Specification Uhmy FG34) which 
contains the list of hosts thai are known to be associated with the distributed environmem 


20 


BrokeifeteBroker FG40 periodically, eg. once per second, provides dw complete list of 
host fitness scores to die Resource Manager FG42. 

It should be noted dial wiien die Hanlware Broker FG40 is first staned, it reads in die 
System Specification Files FG32 (\ia calls to die System Spedficatioo Library (SSL) FG34), 


25 


and infonnation on all applications dial can be run in die distributed environmem. The 
applicatioo-le\-d infonnation indudes where, i.e., on windi host, specific applications can 
be ntn, winch apphcations are scalable, which applications can be restarted, and Bn>- 
dependenctes between apphcations. 
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which files contain die list of hosts dial are known to be in die distributed environment The 
Hanlw^ Broka FG40 abo recei\cs, eg., reads in a file containing, infonnation about die 
bandwidth and maximum packet sizes on aQ known network subnets in the distributed 
emironment It wiQ be appreciated diat dm data advantageousb' can be used for converting 
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Periodical}}', tg., appiocdmatd)- evei>* three seconds, the Handw-aie Broker FC40 
trsnsmits a list of overall and nemxirk host fitness scores to the Hardn-are Broker 
Instrumentation Displs>' n-faicfa wtts coitstructed using the Graph Tool Instiunientation 
Displaj- FG69A-FG69N. Moreover, the Hardware Broka FG40 advantageoush- can recci^ e 
host-based net«-OTk load data &om the Remos Network Data Broker function FG 1 6, «iiidi 
receives netH-ork data \ia the Remos Nen^ork Monitoring softu-are 2. It should be noted that 
if Remos net^^ork data ts available for ariy of the hosts A -N thai are being Rranitored, the 
Remos reported network data advantageousl>* can be used for calcidating the netu'ork fitness 
score for that host, rather than usii^ the host network data received fiom the Histor\* 
Ser\ei(s)FG12A-FG12N. 

The QoS Managers FG44A - FG44N of functional group FG4 are responsible for 
monitoring applicadon-le%'d performance requirements. These requirements are defined in 
the System Spedficaiion Files FG32 and are monitored primaril>' m instiumentaiion data 
obtained directl>' fiom the application code. The QoS Managers FG44A - FG44N 
advantageous!)' can determine if applications or application paths are meeting their assigned 
requirements. If an application is not meeting its performance requirements and the 
application is scalable (in the sense that multiple copies can be nm and the copies u-ill 
perform load-sharing across the copies), the QoS Managers FG44A - FG44N will either 
request that the Resource Manager FG42 scale up a new cop>* of the application or move the 
application to a new host (as an attempt to achieve better performance). Moreover, if there 
are multiple copies of a scalable application running, and all copies are performing well 
below the specified requirement threshold, the QoS Managers FG44A - FG44N wiU request 
that the Resource Manager FG42 diutdowTi a specific cop\'. It should be noted diat the 
division of responsibilit>* between the QoS Managers FG44A - FG44N and the Resource 
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Anj- of dw QoS Managers FG44A - FG44N can also request that the Resource 
Man^ FG42 move an application. This wiD occur m the case wbere one cop>- of an 
application b performing much wt>rse dian all other running copies of the same application. 
In an exemplar>- case, the move request is implemented as a scale up request followed b>- a 
scale down request (of the badl>' performing copr> ). hi that case, the scale dowTi request does 
not get sent to the Resource Manager FG42 until the scale up action has been implemented. 
The QoS Manager FG44 A - FG44N preferabb' cmplo>- appbcanon "settling times," defined 
to the S>stem Specification Files FG32, to ensure thai once a req uest ed action has been sent 
to the Resource Manager FG42 thai no additional actions are requested for that explication 
until after the setding tirtie has elapsed. It will be appreciated dial this pro\ides time for 
initialiTaiion and configuration among die application copies to occur. Alieraativd>', S\*stem 
Spedficaiion Language imer-s^plication dependenc>' definitions advaniageousl>' can be used 
instead of settling times. 

The QoS Managers FG44A - FG44N also receive application status and state 
information from Program Control FG50, which periodical!)' sends application status updates 
for all rxmniiig applications and also sends immediate indications of an>- applications wtich 
have been started or stopped. This information is used by the QoS Managers FG44A - 
FG44N, along with the instrumented performance data being received \tz the QoS Monitor 
FG29 and Instrxmientation Correlator FG34, to determine dw exact state of all monitored 
applications that are running. This information is also used to determine wlien (and if) 
requested actions have been implemented by the Resource Manager FG42. The information 
is also used for setting up and discarding internal data structures used for moiutoring the 
performance of each a pp lica tio nA 1 "NMapplication Al -NM. 

It will be appreciated dial die QoS Managers FG44A - FG44N also receive 
application-level instrumentation data indicating current application performance values 
fiom die Instrumoitation Correlators (Brokers) FG26A -FG26N. die Instrumentation Brokers 
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Manager FG42 is that the QoS Managers determine whsi actions would potential!)' improve 
performance, while the Resource Manager has final audiorit>- to determine whedicr to 
implement the requested action(5). 

Each of die QoS Managers FG44 A - FG44N can be scaled for boUi redundanc>' and 
for load-diaring. hi an exemplary case, each cop>' of the QoS Manager monitors all of the 
requirements associated with a single application path defined in the S)'5tem Specification 
Files FG32. It will be appreciated that the specific path to be monitored can be specified via 
command-line parameters. By default, widiout speci^'ing a path via the command-line, die 
QoS Managers FG44A • FG44N will monitor all requirements for all paths defined in die 
S>'Stcm Specification Files FG32. 

It should be mentioned tliat, in one exemplary embodiment, the ()oS Managers 
FG44 A - FG44N each employ a sliding window algoridim to determine when to declare dial 
applications should be scaled up or scaled down. The inputs to the algorithm define both hig^ 
and low sampling window sizes, the ma?dmimi numba of allowed violations within the 
sampling window, and violation thresholds as a percentage of dw actual specified 
requirement value. It should also be mentioned dial the slidiiig window algoridim was 
selected in order to damp oul unexpected "noise" or "spikes" in the measured performance 
data. Moreover, dw dueshold value as a percentage of die actual requiremem value was 
selected in order to scale up, or scale down, prior to violating the specified hard reqtnrement 
The QoS Managers FG44A - FG44N provide application scale up and scale down requests 
to the Resource Manager FG42 when the measured performance data for an associated 
application violates ddia tiw higji (scale up) or low (scale down) sliding window criteria for 
a specific requirement A scale up request indicates wfaidi appfication on which host has 
violated dw performance criteria, and a scale down reqtwst indicates wlucfa application on 
which host is recommended to be shutdown. It wiD be appreciated dial dw success of diis 
algoridun is hif^* dqwndem on dw rale of chai^ and noisiness of dw measured daia. 



FG28A-FG28N, and/or die Jewel Instnmientation Broker (QoS Monitor) FG29. The 
instrumentation data dial is received contains (al a minimum) ( 1) die ttmetag when die data 
was generated, (2) die hostname and IP address of die host where die application diat die data 
is associated widi is running, (3) die process id (pid) of die application dial die data is 
associated with, and (4) die event number of dw instrumentation message. Preferabfy, die 
event number of dw instrumentation message specifies dw t>-pe of instrumentation daia dial 
has been received; die hostname, IP address, and pid are used, in conjunction widi die 
appfication data received from Program Control FG50, to determine dw specific application 
dial tiw data is associated widi. 

When dw contents of dw instrimwntation message maidi an>* of dw application 
performance requirements dial are currendy being monitored b>'dwQoS Managers FG44A - 
FG44N, dw data value is added to dw proper requiremem slidirtg window for dw specified 
application. The sUding window algorithm is then checked to determirw if the new sample 
triggered a violation of either tiw hig^ or low sliding window. If a high threshold sliding 
window violation occun and dw ^plication does not already have the maximum number of 
copies running, a determination is made as to wiwflwr performance can be best in^iroved b>' 
starting a new application (scale up) or b>' moving an existing copj- to a differem host. The 
corresponding action recommendation will then be sent to the Resource Manager FG42. In 
an exemplar)' case, the criteria for determinirtg wiwther an application should be moved 
rather dian scaled up ts based on relative performance of the replicated applications. More 
specificall)', if oiw application is performing much worse [> 50%) than the other copies, the 
recommendation will be to move the applicatioa Likewise, if dw new sample triggers a low- 
threshold sUding window violaticKi and the appltcatioQ has more dian the nmdmuro number 
of copies running, a recommendation will be sent to dw Resource Manager FG42 requesting 
dial dw cop>- of dw application dial is experiendis dw worst performance be scaled dowTL 

FC5 - RESOURCE (APPUCATION) CONTROL FUNCTIONAL GROUP 
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A$ discussed above, the Resource Control cspabilides provided b>' the Resource 
MsDaganent Architecture consist of coctroUins sppbcaiion stnitup. configundoo, mi 
shutdon-n on hosts nithiii the distributed enviFomnem. This capabilit>-, kno^n as Application 
Control or Program Control (hereafter referred to as Program Cootrd) provides a powerful 
distributed configuraiian capabifiQ*. The Program Control capabilities permit an operator to 
startup and control explications running on platforms throug^ut the distributed environment 
via an eas>'-to-tise interactive displav*. These capabilities are provided b>' the AppUcation 
Control functional ^up FG5. 

More spedficall)-, the Application Control functional group provides application 
control (i.e.. Program Control) capabilities uhich permit starting, stopping, and configuring 
applications on each of the hosts in the distributed environment The functional group 
provides both interactive operator control of the distributed environmem as u-ell as automatic 
control via configuration ordeis received torn the Resource Allocation DedsionrMaking 
functional groiq) FG4, i. e., the Resource Manager component. The interactive controls allow 
an operator to create, load, save, and edit pre-defined s>'stem configurations, e.g.. lists of 
applications that are to be nm, with or uithout specific host mappings, determine the status 
and configuration of curTent]>- running programs, and start and stop any or all implications. 
Both static (operator-entered) mappings of applications to hosts and d>'naniic mappings of 
applications to hosts (uhere the Resource Allocation Decision-Making functional group FG4 
will be queried to determine the proper mapping at run-time) advantageously can be defined. 
The functional group also provides application fault detection capabilities ^vhicfa are 
triggered by the unexpected death, i.e., fault, of an application that «'8S started by the 
functional group. A basic host fault detection capabilit>' is also provided wtich is triggered 
based on failure to receive heartbeat messages fi'om fimctional group components niruiing 
on a particular host 



Resource Manager FG42. 

Program Control Displays FG54A - FC54MN - sene as the GUI for 
interactive control of distributed appbcatioos. The Program Control Display? 
FG54A - FG54MN allow an operator to see and control the stams of 
applications running on each host in the distributed enviroranent. The 
Program Control Displav^ FG54A - FG54MN also provide the user the 
8biHt>- tt> determine the stams of each of the components of the Program 
Control architecture. Predefmed scenario configurations defined in Program 
Control Configuration Files FG56 8dvantageoust>- can be loaded and edited 
via ihe Displa>-s. It should be mentioned that nev*- Program Control 
Configuration Files can also be created and saved via the Displays. As 
illustrated in FigFlGS. 22 A, 2B, Program Control Displa>s FG54A - 
FGS4MN can be ran simultaneously with applicanon status changes being 
reflected at each display-. 

Conftguratioa Files FG56 - contain an ordered set of applications that can 
be loaded at the Program Control display and then either edited or executed. 
The Configuration Files can contain both djmmic and static application-to- 
host mappings. For static application-to^iost mappings, an ^plication will, 
b>- default, be started on a specified host For d>-namic application-to^KBt 
mappings, the application will have a default host to start on but the Resource 
Manager FG42 will be queried at run-time to determine where the apphcation 
actually should be placed. The Configuration Files FG56 also contain all 
information on how to start, stop, and configure an application, with the 
exception of environment variable settings for the application which are set 
based on the Sj-stem Specification Files FG32. 



A brief description of each fimction provided by the functional group FGS is provided 
below; a detailed discussion of the Resource Control functional group FG5 and associated 
data flow wiU be provided in discussing FtglG. 4. 



-H 1) Program Control Agents FG52A- FG52N: A Program 

Control agent generally denote DfG 5 2 d FG52 resides on each 
of the hosts A-N (i.e.. PC A - PCN). Each agent is responsible 
for providing direct control over application startup and 
shutdown of applications on its respective host. The agent 
receives control orders from the Program Control function 
FGSO and is then responsible for implementiiig the orders. In 
an exemplary case, the agents implement the ordeis via 
s>'Stem call medianisms specific to the particular operating 
sj-stem. In addition, the agent also provides feedback to the 
Control fimction FGSO r^arding the currem status of aU 
applications runniiig on a particular host 



Program Control FGSO • maintains the application state information for the 
Program Control functional groi^ FGS. It also serves as the decision-making 
componem of the Program Control fimctional group. The Control fimction 
FGSO receives application control (startup, shutdown, or configuration) 
requests from the Program Control Displa>3 FG54A - FG54N and fiom the 
Resource Managemem functional group FG4. Using information fitim the 
Spectficatian Files FG32, these high-level control function request are 
dynamically translated into specific control orden which are sect to the 
individual PTt)grBm Control agents FG52A -FGS2N . The program Control FG 
SO provides application status and configuration informatioo back to the 



It should be mentioned here that the Program Control functional group employs the 
application startup and shutdown information defined in the S>'Stem Specification Files 
FG32. When an application entry is first created interactively at one of the Program Control 
Display's FGS4A - FG54N, aU of the startup and shutdowTi information for that application, 
as specified in the Sv-stem Specification Files FG32, are loaded in as default settings. Once 
a configuration file entry has been created, all configuration information on the application 
is read in from the configuration file except for the apphcation environment variable settings 
which are still set based on the System Specification Files FG32. 

As mentioned above, a Program Control agent resides on each host The agent is 
responsible for providing direct control over application startup and diutdowrt The agent 
receives control orders fiom the Control component and is then responsible for implemcntir^ 
the orders. Each of the PC Agents FG52A - FGS2N implements application starttip and 
diutdown orders via s>-stem call mechanisms specific to the particular operating s>'stem of 
the host For exan^le. on the Unix platforms, to stan an application, the forkO and execvQ 
function cdls are used to create the application. The csh conunand is executed to start up the 
applications. Moreover, if the application needs to run in a console, an xterm is configured 
for the application to run iiL In addition, if logg^ of either stdoui or stderr is specified, the 
proper redirection operators are configured and the output log file b set to 
'Aisi/tmp/<:userid>_<appname>_<pid>.I(^'. AD environment variables needed bj- the 
application are also configured and passed in at the execvQ caU. The currem working 
directory is also set b>- the chditO command, and the new application is made a process group 
leader via the setpgidO function. Other operating systems invoke applications using different 
calls. 

In order to stop an applicadon on the Unix platforms, if a signal is to be sem to the 
application, die killpgO fimction b used, or dse if a script or command is to be executed to 
shmdow-n the api^icatioa, the cdi command b executed (via die sy^temQ functioo) 
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spect^'tng the fuH paih and ececutuble name of dw command along n-itb an>' argumaits for 




the applicaii 


on startup will fail, and a *dq>endenc>- Cailed' indicatioo wiO be sent to the 




the cftTnmff>d, It should be noted that if the applicsiton de£uih shutdown time elapses and 




Displa>'. It will be appreciated thai thb will cause the apfdicaiion status to be displayed in. 




the application has not died, Ae respective one of the Program Comnd Agents FG52A- 




for exanqde. 


>'enow and post an alen to the Alert window on one of the Prttgram Control 




FG52N adv-antQgeousb* sends a SIGKILL signal to the application b>' calling IdllpgO- 


S 


Displays FG54A-FG54N. 


5 


As illustrated in fisFlGS. +1A. IB, the Program Control Agents (PCA-PCN) 
8d\'antageotist>' can be fm^mt'B^'^ on stand-alone hosts A • N. In that case, the Program 
Control Agents PCA-PCN (FG52A-FG52N in ftsFIGS. 22A. 2B) send heartbeat messages 




Preferabl>'. Program Control fimction FGSO also handles simple startup timing 
dependencies between applications and wiQ reorder a list of applications thai were sdected 
to be started sinuihaoeousb' if doing so will resolve startiq> order dependencies bAw een the 




to Program Control FC50 appra?umatet>' once per second to indicate that the>' are still Ttp"up 




applications. 


Otherwise, the Program Control function FGSO sends a request to the Resource 


10 


aiKl running." Moieo^ a, every ten seconds, the Program Control Agents PCA-PCN (FC52A- 


10 


Maiutger to attempt to resohx the dq>endettcies. 




FG52N) send complete configuration information on all running ^plications to Program 










Control FGSO. It duuld be noted that the tenninolog>- enq}lo>'ed in FtgFlGS. i- 1 A. 1 B difieis 




Hie Program Control DispUt>' serves as the operator console for controlling the 




from that in FrgFIGS. 22A, 2B to emphasize the distinction between soft^'are instantiated 




distributed environment From the Displaj-, shown in figFlGS. 55 A, 5 B, the operator can: 




on a host end a function provided b>' the Resource Management Architecture. 




1) 


see the status and configuration of currently ccecutiiig applications AI-NM; 


15 




15 


2) 


see die status of Program Control Agents PCA-PCN on each host A-N; 




The Program Control function FGSO is the decision-making component of the 




3) 


see and browse the application svstem strxu:ture defmed in the Sv'Stem 




Program Control functiorud group FG5 . It maintains complete information on e\'ei>ihing that 






Specification Files FG32; 




ts running across all platforms in the distributed environment The Program Control function 




4) 


load configuration fdes FGS6 




FGSO receives input data from PCA-PCN (FG52A-FG52N), the Program Control Dispta)-s 




5) 


save configuration files FGS6 


20 


FGS4A-FG54N. the Resource Manager FG42, and die Host Discovei>- function FG14. 

It niU be appreciated from the preceding discussion that the Program Control FGSO 


20 


6) 
7) 


edit die configuration of applications that aro not currentl>' running; 
create new application entries b>* dragging an application, application sj'Stem, 
or s^pUcation subs>'stem icon onto the application status area; 




provides startup and shutdou-n orders to the Program Control Agents FG52A-FG52N based 




8) 


manuall)' start specific applications; 




on operator or Resource Manager-initiated orders. If the Program Control Agents report that 




9) 


manually stop specific applications; 


25 


an application has terminated abnormally, the Program Control FGSO provides a notification 


25 


10) 


manuall>- start all applications diat have die "Start All" flag set; 




to the Resource Manager FG42, to the Program Control Display's FG54A - FG54N, and to 




11) 


manually stop all applications; 




any other component to which it is connected. When the Program Control function FGSO is 




12) 


turn host fault detection on or off(if on, loss of 3 consecutive heartbeats from 




first brou^ up, it can be configured to attempt to start Program Control agents on ever>' host 
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a Program Control Agem wiU resuh in declaring the host down); and 




NCN-83018 






NCN.830IS 




defined in the System Specification Files. The Program Control function FGSO will also 




13) 


turn host discovery on or off (if on, a new host message from the Host 




attempt to start a Program Control Agon on a newiy discovered host (discovered via dw 






Discovay component will result in attempting to start up a Program Control 




Host Discovery fimction FGl 4) if Host Discovery has been enabled on the Program Control 






Agent on the new host). 




Displa>^ FG54A-FG54N. 








5 


The Proggram Control function FGSO also receives periodic heartbeat messages, e.g , 
once per second, from eadi of the Program Control Agents FG52A-FGS2N, as discussed 
above. If Fault Detection has been enabled at the Program Control Displa>-5 FGS4 A-FGS4N, 
if three consecutive heartbeat messages from an Agent, e.g., FGS2A, are missed, the host that 


S 


It will be appreciated from FigFIGS. 22 A, 2B that multiple Program Control Displa>-s 
FG54A-FG54N advantageously can be run simultaneously. If this is done, any configuration 
change actions will be reflected on all the displays. Whenever application stop or start 
actions are taken by the displa>' operator, a message is sent to the Proggram Control function 
FGSO which is responsible for enacting the start or slop action. The Program Control 


10 


the agent is nmning on is declared down and all linked fimctions-, including the Resource 
Manager FG42 and the Displa>-s FGS4A-FGS4N are notified. 


10 


function FGSO also sends indications of any status changes to the Program Control Displa>'S 
FGS4A-FGS4N as soon as the status changes are seen, b addition, periodic status updates 
are also sent to the Program Control Displav-s FG54A-FC54MK. 




As mentioned above, the Proggram Control fimction FGSO sends out periodic 










application status updates as well as immediate notification when applications are started up. 




The Program Control Configuration Filesare text files that aro read in b>' the Program 


15 


aro shutdown, or fail. These notifications are sent out to all linked functions. 

It should be noted that the Proggram Control fimction FGSO uses the same message 


15 


Control Display when the operator wishes to load a new application configuration. A 
Configuration File is an ASCII file containing a list of appUcatiorts. The format of an entry 
in a Configuration File is shown in Table 1 below. 




traffic and internal processing for handling application startup and shutdown orders received 










from either the Resource Manager FG42 or from the Program Control Displa>-s FGS4A- 




Table 1 




20 


FG54N. HowTver. if a startup orda received from one of the Program Control Displao's 


20 


Applkmlkn 

Hut 


TACFIR&tMfliT 




FG54A-FGS4N indicates that the Resource Manager FG42 ^uld determine nvfaere to run 




Acts Start 


ombrldl:0.0 




the application, a request to allocate the application is sem to the Resource Manager FG42. 
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0 
0 




When no response is received from the Resource Manager FG42 within a predetermined 


Ccmofe 
Time Drfrr 


1 


25 


timeout period, the Proggram Control function FGSO wiQ amomaticaDy start the application 
on the default host Moreover, v^itea an application startup cannot proceed due to an 
unfulfilled application startup dependence*, a request wiH be made to the Resource Manager 


30 


SumpDir "SENV SIM YZBSUXiTTACfTREpnaaaei^ 

StMitapZu -SENV~SIM~VERSIONn*ACFIR£pn>oet»rftacfirt' 

SlmitivATp --dlipirl ~ $DIS_PORT_NUM -cffboat %(HOSTNAME. 

AAW:T«rfk3d_Sfans:CFF_BrokrT)" 
SbntdownEic SIC INT 




FG42 to attempt to resoK'e the dependency. If the Resource Manager FG42 either cannot 










resolve the dependenc>' or no response is received within a predetermined timeout period. 
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The Configurauon file sdvsmageousl:}- can include Ae fcrilcm'tng Gelds: 

1) Thff Ap p»tcwtion fic<j. tt-*wrh iftwitifi^ the fiitl aptrficatkm name as defined 
ia the S>-stem Spec. Files FG32 G-C-. S>'SUm:Subs)'stem:Application), 

2) The Host field, nincfa is the desired or default host thai this application 
should be started on. 

3) The Display fidd, wiiidi is an optiona] Geld used n-hen graphical display' 
outpui bom an application needs to be rerouted to a displa>* on a difTerem 
host 

4) The Anto^Start flt uS t vidtii identifies nltcther the application is to be started 
automaticaU)- if the '^tart AU" action is selected by the operator &om tfte 
Program Control Display'. (If the flag were set to then the application 
ft-ould be started. If the Dag were set to "OJ" h wotdd not be started.) 

5) The RM_Start flag. whidJ identifies whetha the tftt-Resource Manager 
should be queried at rua*time to determine vhsi host the application should 
be started on. The \-alid vahies are for "NO" and "l" for TES". 

6) The Console flfl g, which identifies ^^'hether the application needs to be started 
in an Xterm window. The valid vahies are 0 for "NO" and 1 for TES". 

7) The Tbne.Delay field, uhid) identifies how nian>' seconds to wait afier the 
pre\ious application has been started before starting this application. 

8) The StartupDtr field, uiiidi identifies the cuneni working directoi>' thai is 
to be set prior to starting up the application. This directory* is usually the same 
as the director>- u-here the executable for the application resides but does not 
have to be. As this e:iample shows, environment variables may be used in the 
path. 

9) The StartupExe field identifies the entire path and name of the application 
executable. 

10) The StartupArgs Geld, which contains all the argument values needed for 
this particular application. As this exan^le indicates, the argument values can 
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inchiding host and network statuses and performance, application S}'Stem st atu ses and 
perfonnsnce, as wdl as the status and peribnnance of the other Resource Managemea 
architecture fimctioos. Most of the display's use OpenCL and Motif-, the latla being buih 
Vkiih ICS's Builder Xcessor>' tocdkit, and run on SiHcon Graphics (SGI) platforms in an 
exenq)lai>' case. Se\-eral of the displays can also run on the Sun Solaris platforms. The 
display's that make up the display' fimcdooal group FG6 include: 

1 ) Host Displays FG62A -FG62N. Show ]syw3i ofhosts along with host status, 
netn-ork coanecti\it>', and process statuses. 

2) Path Display FG64. Shou-s Ae status of applications in ke>' end-to-end data 
flow paths along with performance and load graphs. 

3) Resourrc Management Decision Review DbplayFG68.Shou-s a summar>' 
of allocation decisions made b>* the Resource Management system along uith 
timing information and host fitness scores. 

4) Gr^b Tool Instrumentation Displays FG69A-FG«9N. Provides a user- 
configurable set of display' widgets used for nm-time moiutoring of 
instrumented status mid performance informatiorL 

5) System Readiness Display FG66. Shows the status of each hardu-are and 
software system, subsystem, and application defined in the Systtm 
Specification Files and allow the operaiorto interactively change system and 
subsystem priorities. 

fHsFIGS. 6-isA, 6B represent a screen caphire of an exemplary one of the Host 
Displi^'s FG62A-FG62N, wtich provide gr^>hical representations of various sets of the hosts 
A- N in the distributed environment The Host Displays show the stams of each host, host 
network connectivity, and the status of interesting processes running on die hosts. The Host 
Dtspl^ operator can also select hosts shown on the Host Display- and bring up real-time 
graphs of system performance for the selected hosts including CPU utilization, memory 
utilization, network packets in, network packets out, and paging activity. A screen capture 
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be dynanucally set at rtm time if needed. Environmeru variables may also be 
used within the argument list. In this example, the %(UN1QUE, 1, 40, Isis) 
argument would yield a number from 1 to 40 which is unique within a 
context named "Isis". Another resolution of %(UN1QUE, 1. 40, Isis) would 
>ield a difiTerent number. 

11) The ShutdownExe Geld, which identifies which signal defined uithin the 
application that program control is to use to shutdown this application. Some 
examples w ould be SIGINT. SIGTERM, or SIGKILL. A shutdown script can 
also be used to shutdown the applicatiort (In that case, there would be 
ShutdownDir, ShutdownExe, and ShutdownArgs fields Usted. The usage 
for the shutdown fields would be used exactly the same as the startup fields.) 

12) The Lx>gType field, wiiich identifies which outputs are to be written to the 
specified log file. The valid values are STDOUT, STDERR, and LOG_ALL. 
STDOUT is the normal output of the application (stdout). STDERR is the 
error output of the application (stderr). LOG_ALL writes both stdout and 
stderr outputs to the file. 

13) The LogDlr indicates the directory where the log file will be th-ritten. Again, 
environment variables may be used here. The log file name will be 
"<userid>_<appname>_<pid>.tog" where <appname> is the full application 
name as specified in the AppIicatioD field, <userid> is the userid of the 
current user under which the program control application is running, and 
<pid> is the system assigned process id of the application beirig executed. 

FG6 - DISPLAY FUNCTIONAL GROUP 

A number of displays which show systooi configuration data and instrumentation data 
in near real-time are induded as part of the Resource Management Ardtitecture. These 
displays support operator and user monitoring of the operation of the distributed environment 
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of host specific performance information is provided in FigFIGS. ?7A, 78. 

FtsFIGS. SttA, ftB represent a screen capnireof a representative Path Display FG64, 
generated by the Resource Management architecture, which shoves the status of key system 
data flow paths consisting of multiple application stages. The number of copies of each 
application in the path is shoun labeled with the host on wiiich the application is running. 
In addition, it should be mentioned that as many as three real-time graphs can be produced 
to depict run-time performance and load metrics related to the applications in the selected 
data path. 

FrgFIGS. 9cA, 98 represent a screen capture of the Resource Management Decision 
Review Display FG68, v^iiicfa advantageously can provide a summary of allocation and 
reallocation actions taken by the Resource Managa FG42. For each action, timing 
information regarding how long it took the Resource Management finctions, eg., the 
Resource Manager FG42 and the Program Controller FG50, to both arrive at a decision and 
to enact the decided action are shown along with host fitness scores that u-ere used in 
arriving at the allocation decision. 

FigsFIGS. +6tOA, I OB and ++MA. I IB are screen captures of the Graph Tool 
Instrtmientaticm Displays FG69A-FG69N. nitich depict user-configurable displays amiable 
of receivcirig data via standardized message formats and open interfaces. The Graph Tool 
Displays FG69 A-FG69N allow the operator to select and configure various display widgets 
Oine graphs, bar charts, pie diarts, iiieters, and text boxes) to build a desired display- lay-out 
Data sources for dri\ir% the nidgets can also be selected interactively. 

FtgFlGS. I2t».\, 1 28 repieseni a screen capture of the System Readiness Display 
FG66, wticfa advantageous* can be a Java™ display with a CORBA™ interlace. The 
display FG66 sbow^ the status of each hardware system, host, appbcation system, a; 



subs>'Stein, and appHcarion drfinrd in the S>stem Specification Files. The top portion of the 
dispia>' sfaou-s a sunmiai>- status for each defined apphcatioo s>steta It should be noted thai 
the displa>' operator can also cfaaoge s>-stem and $ubs>-stem prurities and send the dunged 
priorities to the Resource Manager function FG42. 

As mentioned above, the RMConuns middlen-are package provides objeci-oriemed 
c&em-sen er services for message conmmnicaiion between distributed iqiplicatians and 
function modules. The middlen-are pro\ides location u an s p ar enc>- and automatic socket 
connections and reconnections bet^-een diexd and serva appHcatians. Huse ser\ices 
ad\'aiitageousl>' can be accessed throt^ an object-oriented API niiicfa allou-s diem and 
sen er objects to be e3sil>' created and excfaarige user-defiited message data. The abstraction 
pro\-ided b>' the API sHows the user to quickl>- and easil>- create distributed applications 
nithout needirtg to be awtn of the details of the under1>tns network mechanisms. The 
RMComms middleware pro\ides the following functions: 

provides location transparenc>' between clients and servers 
provides a sinqile powerful object-oriented dieru-scrver API 
supports reliable transport of user-defined message data 
based on Elerkele>- sockets 

uses TCP for message transport 

uses UDP multicast for identification of new clients or sen ers 
serven identified b>* unique assigned UDP/TCP pon numbers 
provides general purpose callback function registration capabilities 

user-specified message callback ftmctions invoked when specified 
messages arrive 

user-specified connection status callback fimction invoked when new 
diem-server connections are established or existing connections are 
broken 
support for multi-threading 



Solarisx86 2.7 
language support using native and GNU compilers 

The RMComms middleware b implemented as a ^laieable object-orioued C-»-^ 
Iibrai>'. The Iibrai>' provides four primar>- object dasses, winch are detailed in Attached 
Appendix C. It wiD be appreciated thai the applications link with this Iibrar\- and can then 
instantiate diem and server objects for comnuinicating with other local or remote 
appUcations. It should be menaioned thai the apphcation source code must also indude a set 
of header files that allow connectians bemren diem and sener objects, wtere each ser\ cr 
t>'pe is assigned a server port number. For diems aitd servers that vtm to commu n ic at e, both 
die diem and the server objects are created spedljirig the same serv er port number. Muliipie 
serv-ers of the sarrw t\T)e can also be created, wiiich all use the sarne serv er port number. This 
advamageoush- provides the abilii>- for man>--to-man>' diem-server connections to be 
established, as illustrated in F^IG. 4. Control of which servers the clients actual]>' connect 
to b handled on the diem side; cUents can specif- whether they wish to establish connections 
v*ith all servers in the distributed environment, with a particular set of serv ers, or with all 
sen'ers running on a particular set of hosts. 

The operation of the Resource Managemem Ardutecture will now be described while 
referring to Fi^. I3A-13C, which illustrate various operations in the distributed 
environment More spedficalJ>% the Resource Managemem Architecture of the s>-stem 
illustrated in Figs. 13 A indudes hosts A-N, wiiere host A provides a video source server 
application A-I , host B provides a video distribution application B- 1 , a contract application 
B-2, and a host load monitor B-3, and host C provides a displa>' brokw application C-1 
applying \ideo signals to a displ^' driver C-2. It will be appreciated that host D is idle and 
that the connections between the various hosts constitute the network 100*. In addition, the 
Resource Management Architecture of FiglG. 13A instantiates various functions, e.g., an 
instrumentation broker FG26', a QoS manager FG44', a resource manager FbG42' and a 



supports both polled and asy-nchronous I/O 
thread-safe 

provides automatic connections between dienis and servers 

supports multiple client and sen er connections within the same 
application 

pro\ides automatic coimections to new clients / new sen ers 

supports simultaneous man>*-to-many clieni-sen er connections 
no separate "naming service" or "application registration" components 
provides automatic dient-sener cormection fault detection and recovery 

provides fault detection mechanisms based on timeouts and broken 

connections 

supports fault recover>- \ia automatic reconnections between clients 
and sen ers 

pro%ides basic suppon for data marshalliiig between madune architectures 
b>te-swapping 

explidt message data t>-pe specification 

all message data sent om using network bjie order 

provides basic capabilities for reading the system dock and performing time 

conversions 

allows re^jistration of user-defmed signal (imerrupi) handler functions 
la>-ered object-oriented design and implementation 
cross-platform support: 

SGI IrirlRIX 6.3/6.4/6.5 
Sun Solaris 2.5. 1/2.6/2. 7/.2.8 
HP HP-UX 10.20 
Linux 2.1/2.2 
Windows NT 4.0 
Windows 95/98/2000 



program control FG5ff. The instrumentation broker FG26* recdves data from each of the 
applications rurmirig in the distributed envirorunent, alihou^ onl>* the lines of 
communication betvv'ecn the applications running on host B are actually depicted. From the 
discussion above, it will be appreciated that each of the applications is linked to an 
Instrumentation API. 

Referring now to FtglG. 1 3B, a QoS violation and its consequences is depicted. In 
particular, the Instrumentation broker FG26' provides data to the QoS manager FG44' wWch 
is indicative of a QoS violation. The QoS manager FG44' notifies the resource manager 
FsG42' of the violation; the resource manager determines that di^licate copies of the 
applications rurming on host B are required artd that these copies should be placed on host 
D. The resource manager FG42' transmits instructions to the Program Control function 
FgGS(y, which starts copies of the running applications, i.e., a video distribution application 
D-l, a contraa application D-2, and a host load monitor D-3. on host D. FtglG. 13C 
illustrates shutdown of the application copies ruiming on host B. It will be appreciated thm 
this shutdowTi ma>- be initiated responsive to the original QoS violation, another QoS 
violation, or a query from the user. 

Having discussed the various ftmctions and features of the Resource Managemem 
Architecture in gross, sdected ftmctions and features will now be described in detail. It will 
be appreciated thai the discussion of the various ftmctions will be signaled using Ae 
designations established with respect u> FigFIGS. 22 A. 2B. 

FG42 - Resource Manaeer Function 



As mentioned above, the Resource Manager FG42 is the primary dedsioi 
component of the Resource Managemem ftmctional grotq). It is responsible for 

(1) responding to application and host Cdhnrs b>' detennining if and what 
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rcccn'oy a coo os dmdd be tsktn; 

(2) detsmimog if and nhere to place new copies of scalable spplicadoos or 
«-faidi scalable spplicaiions should be sfautdoun vben the QoS Managers 
ipdicaie &ai scale-up or scale-dou-n actions shotdd be taken based on 
measured applicadoD perfonnance: 

(3) detennimiig ultere new applicaiions should be placed when requ est ed to do 
so b>' Program Courol: and 

(4) determining u-hicfa and bow man>- applicanons should run based on 
application s>-stem (mission) priorities. 

In order to accomplish these tasks, the Resource Manager FG42 maintains a global \iew of 
the stale of the entire distributed en\'tronmeni including status information on all hosts, 
networks, and applications, ta addition, the Resource Manager FG42 also calculates software 
and hardware readiness metrics and reports these readiness I'ahies for dispta>' purposes. 

The Resource Manager FG42 is an object-oriented multi-threaded application written 
in C++, w-hich uses the RMComms middlen-are for all external communication. The 
Resource Manager FG42 communicates with the various software components instantiatiitg 
the (1) Program Control FG50, 2) Hardwme Broker FG40, 3) QoS Managers FG44A - 
FG44N. 4) QoS Specification Control FG29, 5) Readiness Broker in Readiness Displa>- 
FG66, 6) Globus Broker (not shown), and 7) RM Decision Re\iew Displaj-s FG68A- 
FG68N. 

It will be appreciated that the Resource Manager FG42 receives status and failure 
information about hosts and networks from the Host and Network Monitoring functional 
group FGl. and applications from the Program Control functional group FG5. This 
information includes periodic status updates as weU as immediate updates when statuses 
change, e.g., when a new host is detected or an ^plication fails. In the case of an>- 
application shutdowii, information as to whether the applications were intentionall>* 
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the "besi" hosts for p1yr«TTg oew applications nhen: 

(1) responding to requests &om the QoS Managers to scale up additional copies 
of sn apphcauon; 

(2) attempting to restart failed applications; 

(3) respondii^ to requests to d)-namicall>- allocate certain applications; and 

(4) responding to application system (mission) priorit>' cfaaiiges which reqttire 
scaling vp additional applications. 

Ad^-antageousl)'. the Resource Manager FG42 also receives requests from die (JoS 
Managers FG44A - FG44N for scaling up. moving, or scaling dowTi specific applications. 
The Resource Manager FG42 responds to these requests b>- determining w-hether the request 
dtould be acted upon and, if so, determines the specific action to take and issues orders to 
the Program Control fimction FG50 to start up or shutdowTi specific applications on specific 
hosts. The (JoS Managers FG44A - FG44N are responsible for monitoring specific s>'Stem 
performance nwtrics (e.g., qualitv* of service, or <)oS, requirements) via instrumentation and 
determining if performance can be improved by scal irt g up or moving certain applications. 
When this occurs, the (JoS Managers send a request to dw Resource Manager FG42 
indicating that a new copy of a specific application should be started. If die CJoS Managers 
determine that the performance of a scalable application can be improved by moving an 
application, a scale up request is first sent to the Resource Manager FG42 and when the new 
application has been started, ascaledown request is then sent to the Resource Manager FG42. 
Moreover, when the QoS Managers FG44A .FG44N determine that there are more copies 
of scalable application running then are needed, requests to shutdoun specific applications 
are sent to the Resource Manager FG42. 

It v^ill be appreciated that the Resource Management Ardiitecnire distributes 
functionalit>- between the QoS Managers FG44 A-FG44N and the Resource Manager FG42. 
Thus, the QoS Managers detenrune wbai actions would potentiall>* improve performance. 



shutdown or whether the application actualfy failed advantageousK- can be provided. -The 
Program Control function FG50 also issues requests to the Resource Manager FG42 
whenever new applications need to be d>mnticall>* allocated and whenever the Program 
Control fimction FGSO determines thai the Resource Manager FG42 needs to assess and 
attempt to resolve inter-application dependencies (e.g., one application w^-hich needs to be 
running prior to starting up another application),- 

The Resource Manager FG42 responds to ^plications faults and host failures by 
determining ubether the failed applications can and should be restarted aitd atiempting to 
determine where (and if) there are hosts available that die application can run on. When a 
decision is made by the Resource Manager FC42, a message is sent to Program Control 
fimction FGSO speaking what application to start and where to put it The same general 
mechanism is used when the Program Control function requests thai the Resource Manager 
FG42 determine where to start new applications and/or how to resolve inter-application 
dependencies; the Resource Manager FG42 responds with orders indicating v\iiai 
applications to start and where to start them The Resource Manager FG42 advantageously 
can send application shutdown orders to the Program Control function FGSO requesting dial 
a certain rumting application be stopped; this can occur when the QoS Mariagers iiidicale thai 
cenain scalable applications have too man>- copies rtmrurig or when application system 
priorit>' changes (to lower priorities) occur resuhing in scaling back the application s>-stem 
configuration. See Figs. 1 38 and 13C and the associated discussion above. 

The Resource Manager FC42 receives host load and host fitness information fiom 
the Hardware Broker (Host Load Anal>-zer) fimction FG40. This information includes overall 
host fitness scores. CPU-based fitness scores, network-based fitness scores, and memory and 
p:^^ng-based fitness scores abog with the SPEC95 ratii% of the hosts. This information is 
received tqiproximateh' once a second and includes information on all known hosts in the 
distributed s>-stem. These scores are used by the Resource Manager FG42 for detenmnii^ 
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while the Resource Manager FG42 has fmal authority to determine whether to implement the 
requested actions. 

It should be noted diat when the Resource Manager FG42 is first started, it reads in 
the S>'Stem Specification Files FG32 (via calls to the S>-stem Specification Library (SSL) 
FgG34) wWch contains the list of hosts that are known to be (operatir^ in the distributed 
environment and information on all applications that can be run in the distributed 
environment The application-level information includes where specific applications can be 
run, which applications are scalable, which applications can be restarted, and any 
dependencies between applications.- hi addition, the Resource Manager FG42 receives 
updated application survivabilitj' specifications fiom the(JoS Specification Contrd function. 
This information overrides die application survivability information that was iiiitiall>' loaded 
in from the S>-5iem Specification Files FG32 for the specified ^plication. The information 
is used b>- die Resource Manager FG42 to determine whether die specific application will be 
restarted if it fails at rtm-time. 

It should also be noted dial die Resource Manager FG42 sends application s>'Stem and 
hardware s>'Stem readiness and s>'stcm (mission) priorit>- information to the Readiness 
Broker, wtidi is a translator withing die Readiness Displa>- FG66 and to dw Globus Broker 
(anodier Broker (r»i shown)). The Readiness Broker is responsible for driving a GUl/displa>- 
FG66, which show^ die current readiness data and allows die s>-stem (mission) priorities to 
be changed and sent back to die Resource Manager FG42. The Globus Broker provides 
basic8ll>' die same fimctionatit>' evcept dial only a hifj^leveJ subset of die readiness data 
provided to die Readiness Broker is provided to the Globus Broker. The readi n e ss 
information sent to die Readiness Broker consists of readiness vahies for each application, 
appbcation subs>^iem>, and application s>-stem defined in the Sjstem Specific ati on Files 
FG32. -The scores advamageou$l>- can be based on the status (up/down) of die applications 
and die percent^ of potential copies of scalable apfdications thai are currentl}' rumm^ 
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Host and ncrwori readiness scores are determined based on the host loads and bosi fitness 
scores received from the Hanh^-are Broker FG40.- 

The Resource Manager FG42 also sends tnfonnaiion sbom aOocatioo and 
5 reallocatioo decisions to the RM Dedsian Re\iew Displa>- FG68 (fisFlGS. 9B). 

Information on the dedsioa Aat was made, v,im e\'ent the dectsioo n'as in response to. and 
how long it took to both make the decision and in^tlemem the decision are sent to the 
displav'. In addition, information about the top choices for nliere an application could have 
potentiaIl>' been placed is also sent (tf applicable); this informatioQ includes the host fitness 
10 scores for the selected host and other hosts nticfa could ha%'e been selected. 

As described above, the Resource Manager function FG42 communicates uith 
Program Control FG50, the Hardi^me Broker FG40, the QoS Managers FG44A -FG44N, 
QoS Specification Control (not shouTi - legacy- functionX the Readiness Broker of the 
15 Readiness Display' FG66, the Globus Broker (not shown), and the RM Decision Re\iew 

Displa>' FG68 using the RMComms middleutire. The message formats and contents of each 
message that is exchanged between the Resource Manager fimction FG42 and other 
functional dements of the Resource Management architecture are described in CD- Appendix 
D. The timing and/or e% ent trigger for each message is also described. 

20 

FG40 • Host Load Analyzer (Hardware Broker) Function 

The Hardware Broka FG40 provides tfie host load anal>'sis function of the Resource 
Management functional group FG4. It is responsible primaril>- for detemiining the host and 
25 netwxiii loads on eadi host within the distributed computing en\Tronment. The Hardware 

Broker FG40 assigns a set of fitness scores for each host and periodicaIl>- provides the fist 
of fimess scores to the Resource Manager FG42. HFis^FIG. 1 4 illustrates the connectivity and 
high-level data flow between the Hardware Broker and the other Resource Management and 
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Resource Management-related components. 

The Hardware Broker FG40 is an objert-orienied multi-threaded ^plication written 
in C++, whidi uses the RMComms middleware for all external communication. It receives 
5 operating s>'Stem-level stanises and statistics for each host fi^m the Histoi>' Sen er(s) FG 1 2A 

-FG12N. This information is used for calculating CPU, network. memor>', paging activit>*, 
and overall fitness scores for each host The Hardware Broker periodically (once per second) 
sends the list of host fimess scores to the Resource Manager FG42. 

10 When the Hardware Broker FG40 is first started, it reads in the S>'5tem Specification 

Files FG32 (%ia calls to Sj-stem Spedficadon Library (SSL) FG34) which contain the list of 
hosts that are known to be in the distributed enviroruneni The Hardu-are Broker also reads 
in the file nerworks.dat which contains a list of information about the bandwidth and 
maximum packet sizes on known network subnets. It should be mentioned that this data is 

15 used for converting host networl load tr^formation based on packet counts to load 

tnfonnaiion based on b\tes per second and percentage of available bandwidth. 

It should be mentioned thai there are two other RMComms interfaces that the 
Hardware Broka FG40 uses. Periodicall>' (approximate!}' e% eiy three seconds), the Hardware 

20 Broker FG40 sends a list of overall and network host fitness scores to the Hardware Broker 

Instrumentation Disph^' FG69A - FG69N. As mentioned above, these displa>'S were 
constntcted using the Graph Tool described in the Instrumentation Graph Tool Displa)'. 
Additional!)', the Hardware Broker FG40 can receive host-based netu-ori load data from t!ie 
Remos Broker FC16, wiiidi receives network data via the Remos Netv«t>rk Monitoring 

25 softw^ (denoted 2 in FtsFICS. 22A. r B). If Remos netwtjrk data is available for an>- of the 

hosts thai are being monitored, the Remos data is used for the networic fitness score 
calculation for that host rather than the host netwink data received firom the History 
Senaii). 
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The exemplai>- instance of the Hardware Broka FG40 is an object-orieoied nudti- 
tfareadedappBcaiion. At die hi^test levd, the Hardware Broka otiyect contains die elemeots 
listed bTable D bdo«-. It will be noted that Table D contains a brief descriptioa of each of 
these objects. Additiooal details are provided in CD-Appendix E. 

Table II 



No. 


Tide 


Description 




Host Fitness Database 
object (FitnessDB 


The Host Fitness Database object stores load histor>' 
data and fitness score information for each liost The 
Host Fitness Database is updated and fitness scores are 
recalculamJ when new History* Serv a Host Status 
Response Messages are received. For each host, a 
circular queue of host load hi5tor>- data (Hostlnstance 
dass) is maintaiited with tlie newest data bong placed 
at Uw end of t!ie queue; this historv' data is used for 
recalculating host fitness scores. The Host Fitness 
Database also contains a S>-stem Specification Librar>' 
(SSL) objea v*1uch is used to access SPEC rating 
information for the hosts. 


2 


Signal Registration 
object (SignalRegistiy 
class) 


The Signal Registration objea allou-s for a usa- 
defined SIGINT signal handla to be registered in order 
to permit the Hardware Broka FG4etoFG40 to be 
shutdown gracefull>*. 
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3 


Network Subnet 
Information Database 
object (SubnetDB 
class) 


The Network Subnet Information Database object is 
used to store IP address, ma.\imum bandv^idih, and 
MTU size for each network spedfied in the 
networks.daj file. This information is used for 
converting network packet load information to 
b>tes/second network load information. 


4 


Remos Host Network 
Bandwidth Database 
object (RemosDB 
class) 


The Remos Host Network Bandwidth Database objea 
stores the latest Rcmos-reported network bandwidth 
information for each host being monitored. The 
information stored consists of available bandwidth as 
well as maximum potential bandwidth on a specific 
host network link. If Remos bandwidth information is 
available for a host and the latest data is less than S 
seconds old, the Remos data wiU be used for 
calculatirig (he networi fitness score for the host 


5 


History Serva 
Interface object 
(HistSenlnterface 
class) 


The History Serva Interface objea iolierits from t!ie 
RMComms TCPCommCIient class and is responsible 
for maintaining connections to tlie Histor>' Servers), 
for registering status and message Itandla callback 
functions, for sending messages to the History 
Sen ei<s), and for invoking t!te status and message 
handla callback functioos when connections to 
Histor>- Servers are eitha established or broken or new 
messages are received Sum a Histor>' Sena. 
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TooJ Dispisy Interface 
ot^ect (Instrlnurisce 
dass) 


The InstnnnentaiicBi Graph Tod Display Interface 
object inherits from Ae RMComms TCPCommServer 
dass and ts responstUe for rrwrntenting connectiOQS to 

Graph Tool Dispt^<s), for registering status end 
message handler callback functiGns, for sending 
messages to die Oxtfb Tool Displa><s), and for 
in\-oking the stanis and message handla callback 
functions «1ien connections to Graph Tool Displa>-s 
are ettha estabh^ied or broken or ne%' messages are 
received from a Graph Tool Displa>'. 


7 


Resource Manager 
Interface object 
(ResMgrlnterface 
class) 


The Resource Manager Interface object inherits from 
the RMComms TCPCommSen er dass and is 
responsible for maintaining connections to the 
Resource Manager for registering status and message 
handler callback functions, for sending messages to the 
Resource Manager, and for in%-oking the status and 
message handler callback functions w-hen connections 
the Resource Manager are either established or broken 
or new messages are received from the Resource 
Manager. 
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8 


Remos Broker 


The Remos Broker Interface object inherits from the 




huerface object 


RMComms TCPCoramCIient class and is responsible 




(Remoslnterface class) 


for maintaining connections to the Remos Broker for 






registering status and message handler callback 






functions, for sending messages to the Renus Broker, 






and for invoking the status and message handler 






callback functions when connections the Remos 






Broker are either established or broken or new 






messages are received from the Remos Broker. 



FG44: Qnality-of-Scrvice (QoS) Manager Function 

5 

The QoS Managers FG44 A - FG44N are responsible for monitoring applicatio»-le\'d 
performance requirements, \vfaich requirements are defined in the S>-stem SpedOcation Files 
FG32 and are monitored primarily via instrumentation data obtained directl>' from the 
applicanon code. The QoS Managers FG44A - FG44N advantageous!)- determine if 

10 applications or application paths are satisf>'ing their assigned requirements. When an 

application is not meeting its performance requirements and the application is scalable (in 
the sense that multiple copies can be nm and the copies vnU perform load-sharing across the 
copies), the QoS Managers FG44A • FG44N will either request thai the Resource Manager 
FG42 scale up a new copy of the application or move the application to a new host («tich 

IS hopefully nill resuh in better performance). Moreover, if there are multiple copies of a 

scalable application running and all copies are performing bdow the specified requiremem 
threshold, the QoS Managers FG44A - FG44N nill request that the Resource Manager 
shutdown a specific copy. 

20 The QoS Manager is a single-threaded application niitten in C/C++. It should be 



noted thai the applicatioo can be scaled for both redundanc>- and/or load-sharing. In an 
exemplar)- case, each copj of the QoS Manager moniujrs aD of Ae requirements assodaied 
uith a single apphcaiion path defined in the S)-stem Specification Files FG32. It «iD be 
apfvedaod thai the specific path ut be monitored can be spedfied %ia command-line 
5 parameters. By defeuh, without spedfjing a paA %ia fte command-line, the QoS Manager 

•mH monitor all requirements for all dcfuwd paths. 

As mentioned abo\e; the QoS Manager advantageousij" uses a diding window- 
algorithm to determine w-hen to declare that applications should be scaled up or scaled down. 

10 The ii^uls to the algorithm define both high and low sampling window sizes, the maximum 

number of aDowed violations withm the samptii^ window, and violation duesholds as a 
percentage of the actual spedfied requiremem vahie. It wiD be appreciated that the sliding 
window algorithm was sdected in an effort to damp out unexpected "noise" or "spikes" in 
the measured performance data. Use of threshold vahies states as a percentage of the actual 

15 requirement value was sdected in order to scale up, or scale down, prior to violating the 

spedfied hard requirement It wiU be understood that dte success of this approach is highb' 
dependent on the rate of change and noisiness of the measured data.. 

Again, the QoS Manager uses the RMComms middleware for all external 
20 communicaiioa Eadi cop>- of the Resource Manager talks to ( 1 ) Resource Manager FG42. 

(2) Program Control FG50, (3) QoS Specification Control (not shown), (4) QoS Monitor 
FG29, (5) Instrumentation Correlators FG26A -FG26N, (6) Graph Tool Instnmientaiion 
DispM FG69A-FG69N, and (7) Histor>' Servers FG12A-FG12N. fin an exemplary- case, 
the QoS Managers FG44A - FG44N advantageousl>' can rocdve configuration orders from 
25 the Resource Manager FG42, which allows the Resource Manager FG42 to configure each 

QoS Manager to monitor spedfic application paths and also set the sliding window criteria 
to be used b>' each respective QoS Manager. 
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Each copy of the QoS Manager advantageousl>' can transmit application scale up and 
scale down requests to the Resource Manager FG42 when the measured performance data 
for a respective application violates other the high (scale up) or low (scale down) sliding 
window criteria for a specific requirement A scale up request indicates which application 

5 on which host has violated the performance criteria, and a scale down request indicates 

wilich ^plication on which host is reconunoided to be shutdown. Each copy of the QoS 
Manager can also request that the Resource Manager move an application. This will occur 
in the case where one copy of an application is performing much worse than all other running 
copies. The move request is implemented as a scale up request followed by a scale down 

10 request (of the badl>' performing copy); the scale down request is not transmitted to the 

Resource Manager FG42 until the scale up action has been implemented. 

The QoS Managers FG44A - FG44N use the application "settling times" dcfuied in 
the S)'Stem Spedficadon Files to ensure that once a requested action has been sem to the 
1 5 Resource Manager that no additional actions are requested until after the application settling 

time has elapsed. Tins provides time for initialization and configuration among the 
application copies to occur. In future rdeases, the irtter-a;qilication dependendes will be used 
instead. 

20 The division of responsibilit)' between the (JoS Managers FG44 A - FG44N and the 

Resource Manager FG42 is as follows: 

(1) the (}oS Managers FG44A - FG44N determine what actions would 
potentially improve performance; arid 

(2) the Resource Mana^ rG « 2hair G42 has final authorit>' to determine 
25 wiiether to impleroem the reque st ed actions 

It should be mentiooed that Aere b a Request Acknowledge m cssi ^ from the 
Resource Man^ FG42 wtich has been defined and intplemented within the QoS Maziager 
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code. This message is imended to provide feedba^ to the QoS Mmsga rrwliamng thai die 




It should abo be mentioned thai die Resource Manager FC42 receives updated 




request had been successftd]>- received snd niietfaer the Resource Mmager FG42 imeods to 




appticadoo surviv-abilit}' spftrificMiom from die QoS Specification Control compooena, This 




uoplement ifae request 




information ovenides the application survivabilit>* information that was tnitial}>' loaded in 
from the S>-stem Specification Files forifae specified qiplication. The information b used b>- 


5 


As pmiousl>- mentioQed. die QoS Managers FG44A - FG44N receive application 
status and stase information &om the Program Comroi function FGSO. Program Control 
periodicalh' sends qif^icaiion status updates for aD running appticauons and also sends 
immediate indications of aii>' applications utich have been started or stopped. This 
informatioa is used by the QoS Managers FG44A-FG44N, along uith the instrumemed 


5 


tfae Resource Manager FG42 to determine wbether the specific qiplicationwiU be restarted 
if it fails at run-time. 

As described above, the QoS ManagcR FG44A - FG44N communicates with the 
Resource Manager FG42, Program Control FGSO, the QoS Specification Control (not 


10 


performance data being received via. the QoS Monitor FG29 and Instrumentation Corrdalon 
FG26A-FG26N, to determine the exact statt of the monitored applications Al -NM that are 
ruimiitg. This information is also used to determine uiten (and iO requested actions have 
been implemented b>' the Resource Manager FG42. The information is also used for setting 
up and discarding interna] data structures used for roonitonng the performance of each 


10 


sfaownX the QoS Monitor FC29, an Instrumentation Corrdator (generalK- denoted FC24), 
a Gra;^ Tool Instrumentatioo Displa>' (generaDv* denoted FG69X and the Histor>- Servers 
FGI2A-FGI2N using the RMComms middleware. Hie message formats and contents of 
each message thai is exchanged between the QoS Managers FG44 A - FG44N and these other 
functional components are described in greater detail in CD- Appendix F. Additional details 


15 


appUcatioa 

The QoS Managers FG44A - FG44N also receive application-Ie\'d instrumentation 
data indicatiitg current application performance values from die Instrumentation Correlators 
FG24A-FG24N, the Instrumentation Brokers FG26A-FG26N, and/or the Jewel 


15 


regarding the timing and/or event trigger for each message is also described in the Appendiv 

FG3: SYSTEM SPEanCATION LANGUAGE & SYSTEM SPEOnCATION 
UBRARY (SSL) FUNCTIONS 


20 


Instrumentation Broker (QoS Monitor) FG29. The instrumentation data that is received 
contains (ai a minimum): 

39: (1) the timetag regarding u-hen die data u-as generated; 
40; (2) the hostname and [P address ofthe host ^^iiere die application that the data 
is associated with is running; 


20 


In order to efifectivel)- manage a pool of computing resources, the Resource Manager 
FG42 requires some means or mechanism of determining the capabilities and configuration 
of Ihe computing resources under its control, as weU as the software components that need 
to be executed and the dependencies of these software components on both hardware and 
software resources. Additional^', the Resource Manager FG42 requires die capability to 


25 


•Hr (3) the process id (pid) of the application that the data is associated with; and 

(42: — ) the e\ em number of the instrumentation message. 
The event number of the instrumentation message specifies the type of instrumentation data 
that has been received and the hostname, IP address, and pid arc used, in conjunction uith 
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determine the expeaed mission-level and application-level requirements. Furthermore, the 
Resource Manager FG42 must be able to determine what conuol capabilities are available 
to be used to anempt to recover fi-om fault or QoS \iolation conditions. 
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the application data received from Program Control, to determine the specific application that 




In order to address these needs, a S>stem and Software Specification Grammar has 




the data is associated with. 




been dev eloped to capture the "static" information needed by the Resource Manager FG42 
for effectivdy managing a pool of distributed resources. The grammar captures the following 




If the contents of the instrumentation message match any of the ^plication 




information: 


5 


performance requirements that are currentt>' being monitored b>' the QoS Manager, the data 
value is added to the proper requirement sliding window for the specified application. The 
sliding window algorithm is then checked to determine if the new sample tiiggered a 
violation of either the hig^ or low- slidir^ window. If a high threshold slidir^ window 
\iotation occurs and the application docs not already have the maximum number of copies 


5 


* Hardware and Operating Systems 

• Hardware Configuration 

• Network Configuration 

• Operating S>'Stem and Version 


10 


running, a determination is made as to whether performance can be best improved by starting 
a new application (scale up) or b>' moviiig an existirig copy to a different host. The 
corresponding action recommendation will then be sent to the Resource Manager. In an 
exemplar>* case, the criteria for determining whether an application should be moved rather 
than scaled up is based on relative performance of the replicated applications. Thus, if one 
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•Software 

• S>'stems, Subsj-stems, Applications, Processes 

• Resource Requirements 

• QoS Requirements (EvenB) 

• Survivability Requirements 


15 


application is performing much worse [> S0%] than the other copies, the recommendation 
will be to move the applicatioa Likewise, if the new sample triggers a low threshold sliding 
window \iolatioa, and the application has more than the minimum numba of copies nmning, 
a recommendation will be sem to the Resource Manager FG42 requesting that the copr>- of 
the application that is experienctitg the worst performance be scaled dowTL 


15 


• Path Information: Strucuire and QoS Requirements 

As part of the grammar devdopmem effort, a specification library has also been 
developed that parses the specification files and provides an API for accessing the 
spedficaiion informatioa It will be noted that the specification librai>- was wTitten m C++ 


20 


It will be appreciated from the discusstm above that when a cop^* of the 
Manager is first started, it reads in the S>-stem Specification Files FG32 (viz calls to S>-stem 
Specification Librar>' (SSL) FG34). whidi contain the list of hosts that are knowTi to be in 
die distributed en\iionment end information on all applications thai can be run m the 


20 


and has been ported for all devdopmcnt platforms induding Solaris 2.6. Solaris 2.7, Irix 6.5, 
HP-UX 1 0.20, Red Hat Linux 6.0, and Windows NT 4.0. The Bbrary ad vantsgeoust>- can be 
used by substantiall)' all of the Resource Man^gemem functiond dements, including 
Program Control FG50, Resource Manager FG42, Path QoS Managers, Hardww Broker 
FG40. and Histor>- Servers FGI2A-FGI2N. 


25 


distributed environment. The applicatiaa-le\el information includes where specific 
applications can be nin, wticfa applications are scalable, wfaidi applicatusts can be restarted, 
and sny d^)endenctes between applications. 

-70- 


25 


As iDustraied in FtglG. 3, the API library consists of a yacc file FG302 dial defines 
the BNF grammar, a lex file FG304 dial defines the bjkens of the langu^ and a set of C++ 
classes FG306 dm store the spec file informatioa The lex file FC304 is compiled widi the 
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GKU tool flex FG3 10 and it creates a C++ source file FG320. The GNU tool btsoo FG 312 
compiles the >-acc file FG302 and creates C++ source and header files FG322 and FG324. 
h »tU be noted thai the lex source file FC3W bchides the header file FG3 22. The C++ 
coinpileiFG314thenconipiles Aeser»T> source Gks to create lex and }wc objects FG330 
and FG332. TheC++ compiler FG314 also compiles (he C++ storflge classes FG334. AD of 
these objects are linked into a sin^e Ubrai>- FG34 to be utilized b>' an applicaiioa. FtgfG. 3 
iUustrates this process flow. 

The Sofln-are Specifications Grammar (SSG) provides techniques for describing the 
choiacteristics and requirements of d>Tiamic, path-based real-time s>-stems as well as 
providing abstractions to describe the properties of the software, such as hierarchical 
structure, tnter-connecti%it>' relationships, and run-time execution constraints. The SSG also 
aUo\i^s description of the phj-sical structure or composition of tfte hardwme such as LANs, 
hosts, interconnecting de%ices or ICs (sudi as bridges, hubs, and routers), and their statical!)- 
knouT) properties (eg., peak capacities). Furthermore, the Qudit>'-of-Senice (QoS) 
requirements on various system components advaniageousl)' can be described. 

At the hi^test le\ el, a specification consists of a collection of software s> stems, 
hardware sjstems, and network s>-stems. Tht language rules for spect^ing systems are 
described generBll>* below and in detail in CD-j^pendl\ G. The sj-stem specification 
language hierarch>' is shown bdow; selected details will be presented iirunediatetv* foUouing 

* Software Spedficatioas 
• Application 

• Securit>- 

• Confrguraiion 

• Hardware Requirements 

• Startup Info 
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In cootrast, an applicatioo b an execuoble program that can be started as an 
airtonoTTKHW process on a host Applicatioo attributes tncfaide aH infonnatton necessary to 
startup and shutdonn the application. Associated startup block and dw shutdown blocks 
describe how to stan and stop the appBcaiion and includes infonnaiioo such as die director>- 
and name of the applicatioo, command line options, and emTTonmem vari^e sdting^ - 

An application instantiates an SSL object b>- calling its constructor. This parses the 
spec mes in the specified dircctor>- and populates the objea hierarcfa> to provide the d^ 
the application. The SSL class contains an SSL_Coniaincr member, thai holds the spec file 
data in its lists and maps. AD the sj-stems from the spec files are contained in dw appropriate 
list, software s>'5tcins in the swS>-sList, hardum s\-stems in bwSj-sList, and network s>-stems 
in nwS>sLisL The paihiist contains all the paths in the spec files. The hostlist contains all 
the hosts in the spec files; this list is also a\-ailable fixim the entries in hwS>^LisL Tbe 
piocessUst ccniains a list of processes from Ac CONFIGURATION block. Moreover, it 
should be noted that one or more configuration blocks can exist per application. For example, 
an application that runs on more than one platform ft-ould have multiple CONFIGURATION 
blocks uith different platfonms in each HARDWARE block. 



The application startup block contains all the informatian necessaj>' to, auiomaticaIN' 
or manuall>-, start an qjplicatioa This information includes supported hardware (host) type, 
operatirig-s>'Stem t\pe, and opcratitJg-s>'Stem version(s). This maj- be further constrained b>' 
an optional list of the names of hosts thai can run the application. The startup information 
also includes the working director>' for reading and writing data files, the name of the 
executable, and an ordered list of arguments thai must be passed on the conrunand line wten 
the application is started. Last is a list of processes expected to be seen on the s>stcm when 
the application is running. 



An application shutdown block indicates the conmiand(s) to be used for termination 
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• D>Tiamic Arguments 

• Shutdown Info 

• States 
■ Dependencies 

• Initial Load Estimate 

• QoS Info 

• SuT\'ivabilit>' 

• Scalabilit>* 

• Hardware Spcdflcations 

•Host Info 

• Network Info 

•LANs 

• Network Devices (Interconnects) 



• Path Spedficatioiis 

• Data Flow Graph 

• Daia Flow Info 

• ()oS Requirements 

It will be appreciated thai a software specification is a a^ection of softwve s>stems, 
each of *Wcfa consists of one or more software suhs>-stems. Spedficatioo files are provided 
by the developer to capture as mudi knowledge about their software s>-stem as possible. 
These files provide a modd of the actual s>>'stems which can be used by the Resource 
Manager FG42 at run-time. 



of the application. A shutdown command m^ be a POSIX signal name or may be a shell 
script or batch file. Supported signals include SIGKILL, SIGQUTT, SIGHUP, SIGUSRl, 
SIGUSR2, SIGSTOP, SIGINT, and SIGTERM. The ShutdownTime parameter is the 
maximum time to wait for the an application to die gracefull>' before forcing the application 
to terminate via the SIGKILL signal. 

Other blocks are available. For example, a dqwndency block indicates any 
dependencies the application may have with the startup and/or shutdown of other 
applications (e.g., it may be required thai a particular application be started before another 
application can be started). It will be noted thai the dependency block is used by botii 
Application Control FG50 and the Resource Manager FG42 to determine whether or not it 
is safe to start an application, stop an application, or let an application continue to run.- 



The scalability specification for an application indicates whether an application can 
be scaled via replication. Scalable applications are programnwd to exploit toad sharing 
among replicas, and can adapt d>-namically to vai>iiig numbeis of replicas. The specification 
also indicates whether an application combines its input stream (which mav- be received from 
different predecessor applications and/or devices), and splits ils output stream (which nia>* 
be distributed to different successor applications and/or devices) are also specified. 
"Combining" and "splittiag" are commonly called "forking" and "joining" in parallel 
computing p 



Spedficatioo files advantageous!)' can be provided to describe agiven set of networks 
thai exist in a distributed runtime emironment A netwwk S)Stem specification describes die 
LANs and ICs OmcraHinection devices suth as switches, hubs and romcrs). A S)3tem 
consists of cme or more subs>-stems. A subsj-stem mny coniaio LANs (each with an 
fm<v^frt'^ peak bandwidA specification) and ICs (each containii^ a description of network 
membership). 
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Aih'antageousl>\ a — leal-time QoS requiremfini spedfics&oo indudes tiintng 
constraints such as simple rfftadtip cs ^ tnter-^iFDcessins times, eai thnni^ipuis. A sim;^ 
i^^M^m^ i$ rirfrntrf as the m>rrtmitm eod'^a-cnd paih taienc>' during a cyde fiom die 
bes^nmng to the end of the pmh Inter-processing time is defined as a maxiinum aDou-able 
time betn'een processing of a particular element in the path. The throii^ihpiit requirement is 
defiited as the tninimum number of data iteiiis thm the path niiist process duriitg a uriit period 
of ti" ^ , Each timirig constraint specification ms>' also inchide items thai retaie to the 
d>'namic monitoring of the constraint These inchide rttnini^im and maxiinum slack \'alues 
(thai must be maintained ai rui>-time), the size of a mo\'ing window of measured samples that 
should be obser^'ed, and the maximum tolerable number of violations (uithin the nindow). 

CD-Appendix G described a specification grammar for declaring requirements on 
applications in a dynamic, distributed, heterogeneous resource pool. The graiiunar aUoW''5 the 
description of en^ironmem•dependem application features, nhicfa allou^ for the modding 
and d>'namic resource management of such s>stems. 

A common API u-as developed to allow Resource Management fimctions -access to 
the information contained in the spec files. This is an object oriented API is, in an evemplar>- 
case, u-ritteo in C++, uith libraries ported to all supported platforms. The objea is populated 
b>' parsing the spec files using the BNF grammar defined by lex and >'acc s>'ntax and 
compiled uidi GNU tools flex and bison, as discussed above. Actual population occurs in 
the semantic actions of the >'acc file. 

The SSL_S>'Stem class is a generic class that can hold data for a sofiv^are s>'stem, 
hardware s>'steni, or network sj-stem. The t>T>e member describes the t>'pe of sj stera it 
contains. It also contains a pointer to its parent (h allows for nested s>'Stems of the same 
t>-pe), and a name of the s>'stem. The s>'sUst contains its SSL_S}^tem children, and compList 
contains a list of the s>-stem's components (a list of hosts, for a hardware s>'stem for 



\ia the RMComms TCPCommServ er middleware, 

2) History Servvn FGI 2A-FG1 2N dial collect data fiom the Host Moohofs, 
maTntmn staHis and performance histories on each host in the distributed 
environmcm via an RMComms TCPCommCBeni, and provide dris 
information to display's and other Resource Management conq>ottents using 
an RMComms TCPCommServcr. 

3) A Host DiscovTiy function FGI 4 thai uses SNMP (Simple NctworiL 
Management Protocol) calls and ping ICMP calls to determine when new 
hosts come on-line and if existing hosts go doun and providirig this 
informaiion to Program Control via an RMCorruns TCPCommServcr. 

4) A Remos Netnorfc Data Broker FG 1 6 dial collects information on network 
link bandv^idths fiom Carnegie Mellon Univereii>'s SNMP-based Remos tocrf 
and passes this information by vay of an RMComms TCPCommServcr to the 
Host Load Analv-zer component of the -Resource Allocation Decision- 
Making subsD-stem 

It v\ill be appreciated that Network information is collected b>* both the Remos broker 
FG16 and indirectl>- \ia die Host Monitors FGIOA-FGION. See fisnCS. 22A :b. The 
Remos Broker FGI 6 accesses the Remos netv*-ork informaiioo via the Remos API. -As 
mentioned previousN; Remos uses SNMP calls to die LAN switdies and hosts. -The Host 
, Discover)' function FG14 uses both- SNMP and ICMP (ping) calls to each host A-N to 
determine if a new hos^s) has (have) come on-line or previousl>' discovered hosts have gone 
down. -The Host Monitors FG I OA-FG 1 ON employ Operating Sv-stem calls to gather host and 
network performance statistics. -Intcmall>*, the History Servers FG 1 2A-FG1 2N collect data 
fiom the Host Monitors FG10A-FG210N.- The Monitorii^ fimctional group provides its 
information to the rest of the Resource Management components using RMComms 
TCPCommServcr objects, whidt are discussed in detail elsewhere.- The Remos Broker FGI 6 
sends data to the Host Load Anal>-zer FG40, the History Seners FGI 2A-FG 1 2N send data 



example). 



Preferably*, the Application Program Interface (API) for the S>-stem Specification 
Library (SSL) FG34 uses die C++ Standard Tenq)late Library for data structures such as 
linked lists and hash tables (maps). An application fir^t instantiates the SSL object b>- calling 
its constructor with the name of the directory where the specification fdes reside. This objea 
contains fiuictions that allow setting this directory after calling its constructor 
(setSpecDir<directory name)), clearing the object of all currently held data (clearQ), parsing 
a specific file (parseSpec(filename)), and rebuilding the object (rebmldQ, implicitly clears 
the object first). Once instantiated, diis object provides access to the data in the specification 
files. -CD- Appendix G provides additional discussion regarding this aspect of the SSL.- It 
will be appredated that the SSL objea provides methods dial return all the data it contains. 
For example, the getSWSystems rettmis an STL list of all the software systems specified in 
the specification files. Each entry in this list provides its data by methods such as 
getSy-sNameO, and the sa of application components (Applications pec) thai make up the 
system AU data can be retrieved in this manner. 

FGI: HOST AND NETWORK MONITORING FUNCTIONAL GROUP 

As mentioQed above, extensive monitorTng capabilities are provided in the Resource 
Managemem architecture at the host and network levels. -The information monitored 
includes statuses, configuration informaiion, performance metrics, and detected fault 
conditions.- Moreova. die Host and Netwt>rk functional group FGI consists of four 
components mdiidtng: 

I) Host Monitors FGIOA-FGION, that reside on each machine tndie 
distributed environment and coUea extensive operating system-lev d data for 
each host (CPU and memory usage, etc) and provides it to the History Servers 
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to the Display fimctional group FG6 and Host Load Analyzer FG40, and die Host Discovery 
function FGI 4 provides Program Control FG50 widi information on deteaed or faulted 
hosts. - Additional details on these fimctional elements are provided immediately below. 

FGIOA-FGION Host Moniton 

For monitoring the status and performance of hosts, a Host Monitor process runs on 
each machine within the distributed environment -These Host Monitors FGIOA-FGION 
use operating s>stem-levd mechanisms to retrieve status, configuration, and performance 
information of each host A-N.- The information retrieved indudes 1) operating system 
version and machine configuration, 2) CPU configuration, status, and utilization, 3) memory 
configuTBUon and usage, 4) network configuration, status, and utilization, 5) files>-stem 
configinaticn, sinnrs, and utilization, and 6) process statuses induding CPU, memor>', 
network, and filesystem utilization for each process. -While the Host Monitors are primarily 
responsible for monitoring the status of a particular host, they also provide information on 
network load as seen by a particular host -In the same marmer, the Host Monitors FG I OA^- 
FG 1 ON also provide information and statistics concemii^ any remotely mounted filesystems 
(e.ft, NFS).' 

Preferably, the information die Host Monitors FGIOA--FGI0N coUea is formatted 
into operating system-independent message formats.- These message formats attempt to 
provide a pseudo-standardized set of state, status, end performance information whidi is 
useful to other conqionents of die Resource Managemem ardtitecture and such dial odiei 
components do not have to be aware of or deal with the minor ddtas between data formats 
and semantics.' Since not aD die state and performance dala is available on every platform, 
to indicate which information is available, a group of flags are set in the host configuration 
message indicating whether specific data items are valid on a particular platform 



NCTfcSMW 

It win be t^pFDctated thai the Host Moohois FGIOA^-FGION ha\e a ver>- specific 
interface nidi the Histoi>' Seners FG12A::-FG12N.- It periodicalb' (once a secoDd) sends 
its data to aO Histoi>' Sen en asmected to it (this b tr an s p ar e n t, a proper^- of die RMCoroms 
TCPCommSen cr); -the Histoi>- Smer makes no requests to the Host Monitort.- 



More specifical}>-. the Host Monitors FGIOA-FGION have be 
implemented in C++. This decision allows for a compleifil>' modular design in which 
ptatfonn-spednc code can be restricted to a small numba of modules. This approach 
alleviates an>- of the problems assodaied with porting to various platforms. Curreml)- dicre 
is support for Sun SPARC based ardutectures running Solaris 2.6 and 2.7. Silicon Grafdiics 
MIPS based architectures running IRK 6,5., Hewlett Packard PA-ftiscPA-RISC based 
architectures running HP1020, and Pentium based architecture runnirtg both WinNT 4.0 
Workstation and Red Hat Linux 6.0. The Host Monitor source con^iles under the native 
compilers provided by' Sun Micros>-stems and Silicon Graphics for their respective platforms. 
The Gnu C++ conqxiler (version 2.8. 1 ) ma>- also be used on Hewlett Packard PA - ft iscPA- 
RISC based architectures under HP-UX 10.20 and Red Hat Linux.- Microsoft Visual C++ 
compiles the Windows NT Host Monitor.- Alt Host Monitors utilize the I/O library package 
supported b>' the Resource Management (RM) group under the NSWO High Performance 
Distributed Computing (HiperD) initiative. 

The Host Monitors FGI OA-FG ION acctmiulaie data on a periodic inter\-al specified 
at invocatioiL S>-$tem process table data is accumulated and then filtered to eliminate 
'^uninteresting"' processes* (usually meaning processes belong^ to user ID 0 or 1). It is 
important to note that s)*5teni-wide data is accumulated and processed before the filtering 
stage, so as to insure a complete picture of s>'Stem-uide performance. This s>'5tem-\^ide data, 
along uith the filtered process list, is then made available to the 1/0 nrodule for subsequent 
transmission to client applications. 



The Histoi>- Sen CT fimctioo of Resource Management acts as a data broker betu-eeo 
daemons monitoring individual hosts, known as host oxmitors FGIOA-FGION. and other 
fimctiooal components of Resource Management -The host nuoitors collect performance 
information (such as CPU utilization aiul process status data) &om hosts of v-arious platforms 
(SCa, SUN, HP. Window NT. and Linux). -The host monitors use a RMComms 
TCPCommSener object to distribute this data. -For further information, refer to the host 
monitor and RMComms documentation.- The Histor>- Sencr s FGI 2A-FG 1 2N collect and 
store this data fitim the host monitors FGIOA-FCION and distribute it to other Resource 
Managenwm Clients, sudi as the Host Display's FG62A-FG62N. Graph Display FG69A- 
FG69N, PaA Displ^' FG64, and the Hardware Broker. FG40 

Each Histor>' Sen'er has two modes of operation relating to fauh toletatKC, 
scalabtlit>'. and workload distribution beTKeen multiple instances of Histor>' Sen'ers.- The 
first mode determines ai initialization (through command line argimtents or defauh) the set 
ofhosts to monitor, and this set remains static for the life of the History- Sener process. -The 
second mode reco^iizes the existence of other Histor>- Sen er processes and coordiruttes 
between them,- It allows for d>iajnic changing of the set of hosts each Histor>' Sener 
monitors (example: two Histor>' Sen'ers each trtonitoring half of the hosts, a third History 
Sen-CT starts, artd all three Hi5tor>' Sen ers reconfigure to each monitor one third of the 
hosts.> This also allows Histor>- Sen ers to presen e the data it collected by sending it to the 
others, providing fault tolerance. 

The History Sen er function is written in C++ with an object-oriented design. -The 
main routine processes the command line arguments, retrieves the list ofhosts to monitor 
using an SSL object, instantiates the main Histor>'_Sen er object, and spawTis the Collector, 
Distributor, Communicator, and Display thread. — "nwse threads share the main 
Histor>'_Sen er object- The Collector thread is responsible for collecting and storing data 
fiom the host monitors. -The Distributor thread processes requests from RM Clients.- The 
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FG1ZA-FG12N: History Servers 

The History Sen ers FG 1 2A2-FG 1 2N are responsible for collectirig information from 
the Host Monitors and maintaining histories on the statuses, statistics, and performance of 
each host in the distributed environment -This information can be requested by other 
Resource Maiuigement fimctional group.- Currently, the primary consumers of the status 
information are the Host Load Analj-zer (Hardware Broker) FG40 component of the 
Resource Allocation DedsiorbMaking fimctional group FG4, the Host Displa><s) FG62A- 
FG62N, and the Path Display' FG64. -The Host Load Anal>-zer FG40 receives information 
on host configuration and loads (primarily CPU, -Tnemor>'. and network data) artd uses this 
to assign host fitness scores.- The Host Displ^-s FG62A-FG62N receive and display current 
host status information, process status information, and network connectivity informatioa 
It should be mentioned that the Host Display can also request that the History Sen-ers 
provide CPU load information, network load information, paging activity data, and memory 
utilization information which is used to drive line graph charts for specific hosts selected at 
the Host Display. 

The Histoid' Sen'ers FGI 2A-FGI 2N are designed so that multiple copies can be nm 
simtiltaneousl>'. 'Each Histor>' Sen'er can be configured to either nranitor all Host Monitors 
FGIOA-FGI ON or to monitor only a selected subset of the Host Monitors. -It will be noted 
that the Histor>- Sen ers PG 1 2A - rG 1 2}id t Au tiikie FG 1 2 A-FC 1 2N determine the list ofhosts 
in the distributed environmem that could poteotiaUy be monitored from the System 
Specification LibrBi>* (SSL).- In this manner, the Histor>- Seners FGI 2A-FGI2N can be 
used to provide sunivabtlity (by having multiple HistoO' Sen ers FG 1 2A-FG 1 2N amnected 
to eadi Host Monitor) and/or to perform load-diaring (with the Histor>- Seners FGI2A- 
FGl 2N each monitoring onl>' a subset of the Host Moniton). The History Sen ers FGI 2A- 
FG12N can also be configured to periodically record histor>' data to disk.- These disk files 
can then be used for ofif-line analysis. 
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NOMMIS 

Communicator thread vi-aits for events with other History Seners and takes appropriate 
actions, including triggering the Display thread to update the Histor>' Sen er Display. 

FGU - Host Discovery 

The Host Discovoj* function FGI 4 advantageously can use a Peri script that makes 
SNMP (Simple Network Management Protocol) calls and ICMP ping caUs. -These calls are 
used to periodically scan each subnet and host address in the distributed enviroimient to 
attempt to determine whether there have been any host status changes. In an exemplary case, 
the list ofhosts aiul subnets that are to be monitored is read in fiom a file.- 

The host discovery FGU issues MIB-II SNMP queries to obtain information on the 
hosts A-N on the network. -When a new host is first detected, the new host's operating 
system configuration is queried via SNMP calls. Information on the newiy discovered host 
aitd its operating system configuration is then sent to the Program Control fimction FGSO. 
Likewise, when a host fails to respond to multiple SNMP and ping queries, a message 
indicating that the host appears to have gone down is sent to the Program Control fimcdott 

The Host Discovery' function FG 14 -interfaces with Program Control FGSO using a 
C-H- wrapper class around the Peri script- This wrappa dass contains an RMComms 
TCPCommSen ff. making the data collected by the SNMP calls available to the rest of the 
Resource Managemem components. 

FG16 - Remos Netnork Data Broker 

Tbe final fimctional component of the Host and Network Monitorir^ fimctianal 
group IS the Remos Netwijrk Data Broker FGI 6 wticfa receives informatioa on aetwYirii: link 
bandwidth and oetwmk link bandwidth utiUzation from the SNMP-based Remos networi: 
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monuoring tool, as shown in FtsFIGS. 22 A, 2B and/or FigfG. H4I4- The oemxiri 
infonnation b accessed \ia the Remos API librai>- and b then sent on to the Host Load 
Anal)-zer (Hanhwe Broker) fimctkm FG40 of the Resource AOocaum Decision-Making 
functional groDpF6*gn>up FG4 using an RMConuns TCPCommSen a.- Remos w orks bo- 
using SNMP to quen- the switches (\ia the bridge coDector) to cdleci information on 
network coofiguraiion as well as bandwidth utilization on each link end also '^^x*^ SNMP 
MIB-U queries to eadi host to collect the host's uew* of netwtirk utilization.- The network 
information received from Remos consists of the maximum potential bandwidth and the 
current bandwidth utilization on specific host netwxirk links. 

The Remos Broker FG 1 6 pro%ides the following informaticn about the network link 
for each host -The data b sent to the Host Load Anal>'zer (Hardware Broker) approximaiel>' 
every 2 seconds. The Remos Broker FGl 6 uses configuration files listiog specific hosts and 
switdies that should be queried. 

The fimctions implemented b>' Host Monitor furtctioital group FGl have been 
designed to proWde a s>-stem monitoring capabilitj- not normaIl>' supplied b>- standard S VR4 
or BSD Unix senices. Such ser\ ices include cross-platform reporting of s>-stem process 
loading-, CPU performance, network performance and periodic status surmnar>' reporting. 
The Host Monitors were de\-cloped to support efforts b>' the Hipcr-D Resource Management 
group, attempting to provide a common set of OS level parameters useful for assessing host 
and network load and status, for supporting resource allocation/reallocation algorithms, and 
attempting to provide a minimall>- intrusive, dose to real-time capability for gathering thb 
data. 

Host Dbcowry Dcsi^ 

The Host Discover)' fimction FGl 4 of the Resource Maruigement architecture 
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networi device (in this case, Ate configuration of each reachable host). 

■Rw kostjacovery pl script makes SNMP calls by using subrourines froeh- available 
for public use (freewareX created b>- Simon Leinen. -These subroutines are contained in the 
files BERpm and SNMP_Session.pm.- "n»e SNMPjSession b configurable for sped^ir^g 
timeouts and number of retries before declaring a host unavailable, and for specii>ing the 
SNMP Object Id (OID). 

Additional general and specific dftaih r^arding functional elements of the Host and 
Networking functional group FGl are provided in CD-Appendix R 

FGl: In s tr um entation functional group 

As mentioned above, the NSWC-DD Instrumentation S>'Stem pro\ides general- 
purpose application e\ent reporting and ev ent correlation capabilities. The Instruntentation 
s>'stem forms an ardiitecture thai allows instrumented application data to be easih* acc essi ble 
b>- other components of the Resource Managentem architecture. The m^or functional 
components of the Instrumentation S>'5tem architecture are the following: 

1 ) The Instrumentation API Libraries, wiiich are linked with the applications 
and provide the function call interfaces b>' which the ^plication sends 
instrumentation data. 

2) An Instrumentation Daemon, one copy ofwhich resides on each host in the 
distributed environment and b responsible for reading instrumentation data 
sent b>' the applications, refommtting the data into instrumentation event 
messages and sending the messages to the Instrumentation Collectors. 

3) The Instnimttitation Collectors, wtidi connect to the Instrumentation 
Daemons on each host and receive instrumentation messages from all hosts. 
The Collectors forward received messages to the Instnmientation Correlators 



-87- 



NCri«30l8 

provides resource discoveiv' of hosts on a network. -It identifies tusw hosts that come online 
or previously- known hosts that have gone offline.- The Host Discovery* component can 
determine the hostname, the operatii^ system name and version, and in some cases the 
machine architecture and manufacturer of a newly discovered host- Thb information is sent 
to Program Control so the new host can be added to the pool of resources. 

The Host Discovery functional element FGl 4 consists of a Perl script that contains 
the resource discovery functionality, and a C++ object that receives the output of the Perl 
script and provides dus information to Program Control %ia an RMComms TCPCommServer 
connectiott -Thb b described in CD-Appendix H,- More specificall>-, the Perl script 
kost_discovery.pl bsues ICMP (ping) calls and MIB-II SNMP queries to discover new hosts. 
On initialization, the script populates a data structure called Neijnjb for each of the 
networks (subnets) it needs to monitor.- Currently thb information b hard-coded, the subnet 
b defined as 1 72.30. 1 , and the lower and upper limits for the host are I and 234 respectively. 
It then initializes the global variables for the server host and port, network domain, and the 
executable path for the ping (fpirig) command 

The hostjiisarvery.pl script establishes a basdine of existing hosts using the current 
set ofhosts that answer the fping call, -far each networVsubnet defined in its list of Netjnfo 
(Net_info.pm) data structures, it caUs fping and builds a list of IP addresses ofhosts that 
answered the ping, known as reachable hosts, and a list for those hosts that did not answer 
the ping. -For each reachable host, a Host info (Host_tnfo.pm) data structure b populated 
to store the host's infornuition.-(Key'fields in the //ot/Jn/b data structure include IP address, 
hostname, operating S)'Stem and version, architecture class, and manufacturer. ) -Since the IP 
address of the reachable host b known, a caD to gethostb^-addrO b used to get the hostname. 
Other information for the host b obtained by making a MIB-O (Management Interface Base 
version 2) 5>-5tem Group (Otgect ID 1 .3.6. 1 .2. 1 . 1 . 1 .0) SNMP call to the SNMP agent on each 
reacfaaUe host- Thb SNMP query- returns information on the configuration of a specific 
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and Instrumentation Brokers. 

The Instrumentatioa Corrdators, wiuch receive instrumentation messages 
from the Instrumentation Collectors and provide grammar-driven capabilities 
for corrdating, combining, and reformatting application data into hi^er-level 
metrics (composite events) for use by dbpla\3 or other Resource 
Management components. 

The Instrumentation Brokers, wiiich receive instrumentation messages 
fi-om the hBtrumentation Collectors and perform task-specific reformatting 
artd data manipulation for driving dbptays or other Resource Management 
components. 

- 6) The Jewd Instrumentation Broker (QoS Monitor), which 
b a legacj' conqwnent that can recdve instrumentation data 
from other the open source Jewd instrumentation package or 
from the Instrumentation Collectors. The QoS Monitor 
performs task-specific message reformatting and data 
nuoiipulation for driving displa>*5 and the QoS Managers. 



Instrumentation API Library 

The applications link tn the Instrumentaiion API Library and make API call to 
coitsiruct and send out tnstrtmientation event ntessages. Three separate APIs are provided for 
use b>' the ^tplications: I ) a printfO-style API winch aOows the code to format, build, and 
send instrumentation data with a single function call. 2) a bufrer-cQnstruction-st>-le API 
wiierE the multiple function cafls are made to construct the instrumentation buffer tterativdy, 
one data dement pa call, and 3) a Jew^ function caD API based on the existing API 
provided by- the Jewd instrumentation package (an open-source package produced l^- the 
German Natiood Research Center for Computer Science). The first twt> APIs are the 
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prefened prognunming imeiiaces snd take advamage of se%'enil Vey neu* instmnuntsdon 
features. It nin be appredaied thai the lewd API is pio\ided solel>- for back«-anls 
cott^xatibilit)' uidi existiiig instrumented qiplication code and is iinpleinented as a set of 
around the pTiiitfO-st>-le API. AD three APIs are supported for C and C++. Ada 
bindings have been produced for the buffer-constructioii-st>1e API and the Jewd function 
caflAPl. 

The instrumented data b sem &om the appUcanon to the Instrumentaiion Daemon on 
the same host The cuiiem mechanism for data transfer is \iaUNlX FIFO IPC (intCT-process 
communication) mechanisms. The RFO mechanism was chosen based on reliabilit>', low 
meriieal and ease of implementation. Future implementations of the testrumeotaiion s>-siem 
tosy explore ahemate data passing mechanisms isdudiiig ^larcd message queues. 

Instnmtcntation Daemon 

An Instrumentation Daemon resides on each host in the distributed enN-ironmenL The 
Instrumentation Daemon is interrupted when new data is written to the FIFO, The 
Instrumentation Daemon reads ihe data firom the FIFO and refomiats the data into the 
standard tntemat Instrumentation message format and sends the data to each of the 
instrxmientation Collectors that are currentK* active. (For future implementations, an event 
request filtering mechanism uill be implemented so that specific e^ ent messages will onl>' 
be sent to those bistrumentation Collectors that have requested the message.) 

InstmmentBtion Collectors 

The Instrumentation Collectors receive instrumentation messages from the 
Instrumentation Daemons on each host in the distributed en\-irormienL Currently, the 
Instrumentation Collectors send every instrumentation message to all Instrumentation 
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Brokers and Instrumentation Correlators that have connected to the Instrumentation 
CoQector. (For future implementations, an event request filterit^ mechanism will be 
implemented so that specific event messages will only be sent to those Instrumentation 
Brokers and Instrumentation Correlators that have requested the message. For now, the 
Instrumentation Collector serves as a pass-throu^ ser\-er for instrumentation messages. The 
Instrumentation Collector does supports architecture scalabihty in the sense that without the 
btstnmientation Collectors, each Instrumentation Broker and Instrumentation Correlators 
would need to maintain connections to the Instrumentation Daemons on e\'ery host.) 

lostrumentation Correlators 

The Instrumentation Correlators proWde grammar-driven capabilities for correlating, 
combining, and reformattii^ application data into hig^-Ie\-el metrics (con^ite events) 
for use b>- displa)? or other Resource Management components. Each Correlator reads in a 
user-specified correlation grammar file that is interpreted al run-time by the Correlator's 
instrumentation correlation engme. 

Instrumentation Brokers 

The Instrumentation Brokers are task-specific applications built around a conunon 
code package. The bstrumentation Brokers receive instrumentation messages from the 
bistrumentation Collectors, filter all received instnmientation messages to find the messages 
of interest, and perform task-specific nwssage data reformatting and manipulation for driving 
other components such as displays or other Resource Management con^xjnents. The 
bistrumentation Broker approadi allou's for instrumentatioii data sources to be quickl>- 
int^rated for test, display-, and ddwggir^ purposes. (As the btstrumentation Corrdator 
grammar and corrdatioo engtite mature in fixture rdeases, it is antidpaied that the 
Instrumemaiion Broker approach uiD be used less frequently.) 



Jewel Instnmientation Broker (QoS Monitor) 

The Jcwtd Instmmentation Broker (hereafter referred to the QoS Monitor) is a legac> 
ardhiteciure component that sened as a broker between the Jew d instrumentaiioo package 
components and Resource Managemem components and displays. The (?oS Monitor was 
responsible for polling the Jewel CoDector components to re«rie\e apphcaiion e\eni 
messages. These messages were then reformatted and used to drive several displaj-s and the 
QoS Managers. The Jewd instrumentation package has now been replaced in all 
applications, howev er the message reformatting cqjabilities of Ihe QoS Monitor have been 
maintained so thai se> eral displaj-s and the existing (JoS Manager interface do not have to 
be upgraded irrirriediatd>\ The QoS Moxiitor component has been nwdified so that it receives 
instrumentation data fiom both Jewd and the Instrtmientation Collectors. 

Middlewan 

The RMConmis middleware package, whidj is described in the RMComms 
Middleware Design Report, provides the imemal nwssage passing interfaces between the 
Resource Management components connected \ia the network. The middleware proi-ides for 
automatic location-transparent man>*-to-man\' dient-sen'er connections. Low overhead, 
rdiable message passing capabilities are pronded. Registration of message handler callback 
functions for specified requested message ^-pes are provided with die message handler 
functions being invoked when messages arrive. Registration of connection status callback 
fimctions which are invoked when either new connections are made or existing connections 
are broken is also provided. The middleware package also allows for multiple client and 
server objects to be instantiated in the same application, is thread-safe, and pro\ides an eas>'- 
to-use object-oriented API throu^ whidi all capabilities are accessed. 

Additional details regarding the Instrumentation functional group FG2 are proi-ided 



in CD- Appendix I. 

FG42: Resource Manager 

The Resource Manager 42 is the primai>' decision-makii^ component of the 
Resource Management toolkit. It is responsible for 1) responding to application and host 
failures b>' determinirig if and what recovery actions should be taken, 2) determining if and 
where to place new copies of scalable applications or which scalable applications should be 
shutdown when the QoS Managers FG44A-FG44N indicate that scale-up or scale-down 
actions should be taken based on measured application performance, 3) determining where 
new applications should be placed when requested to do so bj-- Program Control, and 4) 
determining which and how aaaiy applications should run based on qiplication system 
(mission) priorities. In order to accomplish these tasks, the Resource Manager 42 maintains 
a global view of the state of the entire distributed emirorunem induding status information 
on all hosts, networks, and applications. In addition, the Resource Manager 42 also calculates 
software and hardware readiness metrics and reporte these readiness values for display- 
purposes. Figarr+FIGS. 1 A. !B shows the connectint>' and hifij»-le\d data flow between 
the Resource Manager 42 and the other Resource Management-rdaied components. 

The Resource Manager 42 recdves status and failure information about hosts, 
networks, and applications firom Program Control. This information tndudes periodic stanis 
updates as well as inunediate updates when statuses change such as a new host being 
detected or an appUcction failing. In the case of i^licatioos going down, information as to 
whedter the applications were shutdown on purpose or whether they fjdled is also sent 
Program Control also issues requests to the Resource Manager 42 when new appUcatians 
need to be dy-namicall)- allocated and when Program Control determines dial the Resource 
Manager 42 needs to assess and attempt to resolve inter-appUcation dependencies (such as 
an application which needs to be running prior to starting up another appBcatioo). 
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The Restwpcc Mapagg 42 respoods to faidied appdicauons and hosts bj^'deternuiimg 
wfaedier the failed appHcatioos can and should be restarted and aiteinptuig to detennme 
u-here (and if) there ore hosts a%-ailahle thai the q^plicatrao can nm on. When a dectsioQ b 
made b^- the Resource Manager 42, a message is sent to Program Comral spectrins vAsi 
appticatioa to stan and nhere to put it The same general mechanism b used ulien Program 
Control requests that the Resource Manager 42 determine n-here to stan new apphcauons 
and/or how to resolve tnier-apptication dependencies; the Resource Manager 42 responds 
with orders indicating nhat applications to stan and uhere to start them. The Resource 
Manager 42 also sends application shutdont) orders to Program Control requesting thai 
certain apphcaiion be stopped; this can occur niien the QoS Managers FG44A-FG44N 
indicate that certain scalable applications have too many copies running or niten application 
sj-stcm priori!)- changes (to lower priorities) occur resulting in scaling back the application 
s>-stem configuration. 

The Resource Manager 42 receives host load and host fitness information on all 
knoun hosts from the Hardware Broker 40 (Host Load Anabw). This information indude 
overall host fitness scores, qt a-ba sedCPU-based fitness scores, network-based fitness scores, 
and memory and paging-based fitness scores aioag with the SPEC95 ratings of the hosts. 
This information is received approximately once a second and includes information on all 
knoun hosts in the distributed sj-stem. These scores are used bj- the Resource Manager 42 
for determining the 'best" hosts for placing new applications when: 1 ) responding to requests 
from the QoS Managers FG44A-FG44N to scale up additional copies of an apptication, 2) 
attempting to restart failed applications, 3) responding to requests to d>iiamicalK' allocate 
certain applications, and 4) responding to application s>'stem (mission) priorit)- changes 
which require scaling up additional applications. 

The Resource Maiuiger 42 receives request fiomthcQoS Manager FG44A-FG44N 
for scaling up, nwving, or scaling down specific applications. The Resource Manager FG42 
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responds to these requests by determining ^^ite1her the request should be acted upon and, if 
so, determines the specific action to take and issues orders to Program Control to start up or 
shutdown specific applications on specific hosts. The QoS Managers FG44A-FG44N are 
responsible for monitoring specific sj-stem performance metrics (e.g., qualit>' of service, or 
C^S, requirements) %ia instrumentation and determining if performance can be improved by 
scaling up or mo\ing certain applications. When this occurs, the QoS Managers FG44A- 
FG44N send a request to the Resource Manager FG42 indicating that a new copy of a 
specific application should be started. If the QoS Managers FG44A-FG44N determine that 
the performance of a scalable application can be improved by moving an application, a scale 
up request is first sent to the Resource Manager FG42 and wiien the new application has 
been started, a scaledoun request is then sent to the Resource Manager. Also, u-hen the QoS 
Managers FG44A-FG44N determine that there are more copies of scalable application 
running then are needed, requests to shutdown specific applicadons are sent to the Resource 
Manager FG42. The division of responsibilit>' is that the QoS Managers FG44A-FG44N 
determine what actions would potentiaIl>' improve performance, but the Resource Manager 
FG42 has final 8Uthorit>- to determine whether to implemem the requested actions. 

When the Resource Manager FC42 is first started, it reads in the S>-stem 
Specification Files (%ia S>-stem Specification Library, SSL, calls) which contain the list of 
hosts that are known to be in the distributed environmem and informatioo on all applicanons 
that can be nm in the distributed environment The System Spedficatioo Files also indude 
applicaiion-le\-d information induding where specific applications can be run, which 
applications are scalable, which applications can be restarted, and ai^* dependendes between 
applications. 

The Resource Manager FG42 can also receive updated application 5urvi\-abilit>' 
spedficaaons fiom the Spedficatioa Control component This information overrides 
the applicatioo survivability information that wa initially loaded in fiom the S>-stem 



Spedficatioo Files for specified apphcaiioDS. The information is used b>- the Resource 
Manner FG42 to determine niteUwr the specific appticatiotts will be restarted if d)e>- fail ai 



The Resource Manager FG42 sends applicatioo 5>'5tem and hardwve sj-stem 
readiness and s>stem (mission) priorit>* information to the Readiness Broker and to the 
Gobus Broker. The Readiness Broker is responsible for driving a GUl/displa>- uhich shows 
the current readiness data and allows the s>-stem (mission) priorities to be chained and sent 
back to the Resource Manager FG42. The CHobus Broker provides basicall>- the same 
functionalii)- except diat on]>- a U^evd subset of the readiness data provided to the 
Readiness Broker is provided to the <Hobus Broker. The readiness information sem to dte 
Readiness Broker consists of readiness values for each application, application subsvtiem, 
and ^plication s>-stem defined in tfw S>-stem Spedficaiion Files. The readiness scores are 
cunenil)' based on the status (up/doun) of the applicatirats uithin a 5>^tem or suhsv-stem 
along with the percentage of potential copies of scalable applications that are currentK- 
running. Host and netv\-ork readiness scores are also calculated and are the scores are 
determined based on the host load information and host fititess scores recdved from the 
Hardware Broker 40. 

The Resource Manager FG42 also sends information about allocation and 
reallocation decisions to the Resource Managemem Decision Reviev^' [Jisplay. Information 
on the decision that was made, v^hat event the decision was in response to, and how long it 
took to both make the decision and implement the decision are sent to the displa>'. In 
addition, information about the top choices for where an application could have potentiall>' 
been placed is also sent (if applicable); this infomiation indudes the host fitness scores for 
the sdected host and the next best host choices which could have been selected. 

See CD- Appendix M for additional details regarding Resource Manager FG42. 



In the Background Section of the application, the reader ma>' have interpreted the 
semence The present invention relates generall>'to resource management s>'stems b>' which 
networked computers cooperate in performing at least one task too complex for a single 
computer to perform" to indicate that the Resource Management Architecture is limited to 
such iCT*licatioi^ " Thus, v^iiile the Resource Management Architecture generall>' supports 
tasks distributed across multiple hosts, it is not limited to only those tasks that must be 
distributed due to the inabilit>' to nm them on a sin^e machine.- Moreover, the Resource 
Management fimctional elements advantageousl>' could be used to control a set of 
applications whidi all run on the same machine while still providing monitoring, fault 
tolerance, etc. (albdt that this is not the normal or even the intended configuration). 
Furthermore, the Resource Management Architecture, as discussed above, deals with 
rtsource managed applications, where the managed characteristic mav- be one of scalabilit>\ 
sunii'd)ilit>*, fault tolerance or prioritv-. 

FtglG. 15 is a block diagram of a CPU-based s>-stem 400, comspooding to one or 
more of the hosts A-N.- The s>^tem 400 indudes a central processing unit (CPU) 402, eg., 
a microprocessor, diat communicates with the RAM 4 1 2 and an I/O device 408 ova a bus 
420. It must be noted that the bus 420 ma>- be a series of buses and bridges conunonl>- used 
in a processor-based system, but for convenience purposes onl>-, ihc bus 420 has been 
illustrated as a sin^ bus. A second 1/0 device 410 b provided in an exemplary case. The 
processor-based S}-stem 400 also indudes a primary memory 4 1 2. an additional merooiy 4 14, 
wtich could be either a read-only memory (ROM) or another -memory device, e.g., a hard 
drive or the like. The CPU- based sj-stem may indude peripheral devices such as a floppy 
disk drive 404, a compact disk (CD) ROM drive 406, a display* (not shown), a k^- board (not 
shown), and a mouse (also not shown), that comrmmicaie with the CPU 402 over *the bus 
420 as is weU known in the art It wiD be appreciated that die either one of the memories4l2 
or 414 advantageous!)' can be emplo>-ed to store computer readable instructions for 
convertii^ the general purpose system 400 into one of die host A-N. -It will also be 
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Bppredaied dud the nature of the distributed emiraameni pennits die necess8i>' appiicatioa 
and APVs oeeded to tmplemem the Resource Mansgemem Ardmecture to9 be stored 
an>'«'faere on the netntiriL 
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pn>\ided on the CD-ROM filed caocurrentl>- -witfi the appiicatioa- In addition, die CD-ROM 
also bdudcs d» source code listt^g for the Resource Ma n age m e m Architecture accordi n g 
to the present invention. 



Although presenth* preferred embodinwnts of the present ind ention hsvc been 
described in detai] herein, it should be deari>- understood that man>- variations and/or 
modifications of the basic inventive concepts herein taught, wWch aasy appear to those 
skilled in the pertinent ait, nil! stiU fall within the spirit and scope of the present im ention, 
as defined in the q)pended claims. 
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Table III 



APPENDIX 


TYPE 


DESCRIPTION 


A 


Attached 


Resouitc Manaeemem Architecture Functioa List 


B 


Attached 




C 


Attached 


API Listing for RMComms 


D 


CD 


Resoarce Manager Interlace Messages 


E 


CD 


Host Load Anal>-zer (Haidn-aie Bioker) FimctioD 


F 


CD 


Qttalit>'-af-scrv-ice (QoS) Managa Functioa 


G 


CD 


FgC3:' S)'5tem Speciftcatioa Language & S>-sten) SpecificatiaD 
Librai}- (SSL) Functions 


H 


CD 


Host And NetnorV Monitoring Functional Group 


1 


CD 




J 


CD 


Diq>lay Functional Group 


K 


CD 




L 


CD 


System Readiness Display' 


M 


CD 


Resource Manager Rnrf2FC42 


N 


CD 




O 


CD 


Host Dtscovei)' Functton 


P 


CD 




Q 


CD 


Pii^aiu Coutiol Applicatioo Comndler 


R 


CD 


Pii^aiu Control Displaj* 


S 


CD 


Pii^ani Cootiol Functional Group 


T 


CD 


QoS Manager 


U 


CD 


Resource AQocatioo fVr*T*TPT*-'"f*'^s Funcdoosl Group 



25 

Table III pnnides a listing of the Appendides included for all purposes in (tie 
application.- It niD be noted thai the m^rit^' of the listed Apcndiei e s Appendices are 
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